2024
|
Lampe, Ajda; Stopar, Julija; Jain, Deepak Kumar; Omachi, Shinichiro; Peer, Peter; Struc, Vitomir DiCTI: Diffusion-based Clothing Designer via Text-guided Input Proceedings Article In: Proceedings of the18th International Conference on Automatic Face and Gesture Recognition (FG 2024), pp. 1-9, 2024. @inproceedings{Ajda_Dicti,
title = {DiCTI: Diffusion-based Clothing Designer via Text-guided Input},
author = {Ajda Lampe and Julija Stopar and Deepak Kumar Jain and Shinichiro Omachi and Peter Peer and Vitomir Struc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/06/Dicti_FG2024_compressed.pdf},
year = {2024},
date = {2024-05-27},
booktitle = {Proceedings of the18th International Conference on Automatic Face and Gesture Recognition (FG 2024)},
pages = {1-9},
abstract = {Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on applications, there has been relatively less focus on facilitating fast prototyping for designers and customers seeking to order new designs. To address this gap, we introduce DiCTI (Diffusion-based Clothing Designer via Text-guided Input), a straightforward yet highly effective approach that allows designers to quickly visualize fashion-related ideas using text inputs only.
Given an image of a person and a description of the desired garments as input, DiCTI automatically generates multiple high-resolution, photorealistic images that capture the expressed semantics.
By leveraging a powerful diffusion-based inpainting model conditioned on text inputs, DiCTI is able to synthesize convincing, high-quality images with varied clothing designs that viably follow the provided text descriptions, while being able to process very diverse and challenging inputs, captured in completely unconstrained settings. We evaluate DiCTI in comprehensive experiments on two different datasets (VITON-HD and Fashionpedia) and in comparison to the state-of-the-art (SoTa). The results of our experiments show that DiCTI convincingly outperforms the SoTA competitor in generating higher quality images with more elaborate garments and superior text prompt adherence, both according to standard quantitative evaluation measures and human ratings, generated as part of a user study. The source code of DiCTI will be made publicly available.},
keywords = {clothing design, deepbeauty, denoising diffusion probabilistic models, diffusion, diffusion models, fashion, virtual try-on},
pubstate = {published},
tppubtype = {inproceedings}
}
Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on applications, there has been relatively less focus on facilitating fast prototyping for designers and customers seeking to order new designs. To address this gap, we introduce DiCTI (Diffusion-based Clothing Designer via Text-guided Input), a straightforward yet highly effective approach that allows designers to quickly visualize fashion-related ideas using text inputs only.
Given an image of a person and a description of the desired garments as input, DiCTI automatically generates multiple high-resolution, photorealistic images that capture the expressed semantics.
By leveraging a powerful diffusion-based inpainting model conditioned on text inputs, DiCTI is able to synthesize convincing, high-quality images with varied clothing designs that viably follow the provided text descriptions, while being able to process very diverse and challenging inputs, captured in completely unconstrained settings. We evaluate DiCTI in comprehensive experiments on two different datasets (VITON-HD and Fashionpedia) and in comparison to the state-of-the-art (SoTa). The results of our experiments show that DiCTI convincingly outperforms the SoTA competitor in generating higher quality images with more elaborate garments and superior text prompt adherence, both according to standard quantitative evaluation measures and human ratings, generated as part of a user study. The source code of DiCTI will be made publicly available. |
2023
|
Babnik, Žiga; Peer, Peter; Štruc, Vitomir DifFIQA: Face Image Quality Assessment Using Denoising Diffusion Probabilistic Models Proceedings Article In: IEEE International Joint Conference on Biometrics , pp. 1-10, IEEE, Ljubljana, Slovenia, 2023. @inproceedings{Diffiqa_2023,
title = {DifFIQA: Face Image Quality Assessment Using Denoising Diffusion Probabilistic Models},
author = {Žiga Babnik and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/121.pdf
https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/121-supp.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics },
pages = {1-10},
publisher = {IEEE},
address = {Ljubljana, Slovenia},
abstract = {Modern face recognition (FR) models excel in constrained
scenarios, but often suffer from decreased performance
when deployed in unconstrained (real-world) environments
due to uncertainties surrounding the quality
of the captured facial data. Face image quality assessment
(FIQA) techniques aim to mitigate these performance
degradations by providing FR models with sample-quality
predictions that can be used to reject low-quality samples
and reduce false match errors. However, despite steady improvements,
ensuring reliable quality estimates across facial
images with diverse characteristics remains challenging.
In this paper, we present a powerful new FIQA approach,
named DifFIQA, which relies on denoising diffusion
probabilistic models (DDPM) and ensures highly competitive
results. The main idea behind the approach is to utilize
the forward and backward processes of DDPMs to perturb
facial images and quantify the impact of these perturbations
on the corresponding image embeddings for quality
prediction. Because the diffusion-based perturbations are
computationally expensive, we also distill the knowledge
encoded in DifFIQA into a regression-based quality predictor,
called DifFIQA(R), that balances performance and
execution time. We evaluate both models in comprehensive
experiments on 7 diverse datasets, with 4 target FR models
and against 10 state-of-the-art FIQA techniques with
highly encouraging results. The source code is available
from: https://github.com/LSIbabnikz/DifFIQA.},
keywords = {biometrics, deep learning, denoising diffusion probabilistic models, diffusion, face, face image quality assesment, face recognition, FIQA, quality},
pubstate = {published},
tppubtype = {inproceedings}
}
Modern face recognition (FR) models excel in constrained
scenarios, but often suffer from decreased performance
when deployed in unconstrained (real-world) environments
due to uncertainties surrounding the quality
of the captured facial data. Face image quality assessment
(FIQA) techniques aim to mitigate these performance
degradations by providing FR models with sample-quality
predictions that can be used to reject low-quality samples
and reduce false match errors. However, despite steady improvements,
ensuring reliable quality estimates across facial
images with diverse characteristics remains challenging.
In this paper, we present a powerful new FIQA approach,
named DifFIQA, which relies on denoising diffusion
probabilistic models (DDPM) and ensures highly competitive
results. The main idea behind the approach is to utilize
the forward and backward processes of DDPMs to perturb
facial images and quantify the impact of these perturbations
on the corresponding image embeddings for quality
prediction. Because the diffusion-based perturbations are
computationally expensive, we also distill the knowledge
encoded in DifFIQA into a regression-based quality predictor,
called DifFIQA(R), that balances performance and
execution time. We evaluate both models in comprehensive
experiments on 7 diverse datasets, with 4 target FR models
and against 10 state-of-the-art FIQA techniques with
highly encouraging results. The source code is available
from: https://github.com/LSIbabnikz/DifFIQA. |
Ivanovska, Marija; Štruc, Vitomir; Perš, Janez TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models Proceedings Article In: 18th International Conference on Machine Vision and Applications (MVA 2023), pp. 1-6, 2023. @inproceedings{MarijaTomato2023,
title = {TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models},
author = {Marija Ivanovska and Vitomir Štruc and Janez Perš },
url = {https://arxiv.org/pdf/2307.01064.pdf
https://ieeexplore.ieee.org/document/10215774},
doi = {10.23919/MVA57639.2023.10215774},
year = {2023},
date = {2023-07-23},
urldate = {2023-07-23},
booktitle = {18th International Conference on Machine Vision and Applications (MVA 2023)},
pages = {1-6},
abstract = {Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates
state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github. com/MIvanovska/TomatoDIFF},
keywords = {agriculture, dataset, deep learning, diffusion, plan segmentation, plant monitoring, robotics, segmentation, tomato dataset},
pubstate = {published},
tppubtype = {inproceedings}
}
Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates
state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github. com/MIvanovska/TomatoDIFF |
Boutros, Fadi; Štruc, Vitomir; Fierrez, Julian; Damer, Naser Synthetic data for face recognition: Current state and future prospects Journal Article In: Image and Vision Computing, no. 104688, 2023. @article{FadiIVCSynthetic,
title = {Synthetic data for face recognition: Current state and future prospects},
author = {Fadi Boutros and Vitomir Štruc and Julian Fierrez and Naser Damer},
url = {https://www.sciencedirect.com/science/article/pii/S0262885623000628},
doi = {https://doi.org/10.1016/j.imavis.2023.104688},
year = {2023},
date = {2023-05-15},
urldate = {2023-05-15},
journal = {Image and Vision Computing},
number = {104688},
abstract = {Over the past years, deep learning capabilities and the availability of large-scale training datasets advanced rapidly, leading to breakthroughs in face recognition accuracy. However, these technologies are foreseen to face a major challenge in the next years due to the legal and ethical concerns about using authentic biometric data in AI model training and evaluation along with increasingly utilizing data-hungry state-of-the-art deep learning models. With the recent advances in deep generative models and their success in generating realistic and high-resolution synthetic image data, privacy-friendly synthetic data has been recently proposed as an alternative to privacy-sensitive authentic data to overcome the challenges of using authentic data in face recognition development. This work aims at providing a clear and structured picture of the use-cases taxonomy of synthetic face data in face recognition along with the recent emerging advances of face recognition models developed on the bases of synthetic data. We also discuss the challenges facing the use of synthetic data in face recognition development and several future prospects of synthetic data in the domain of face recognition.},
keywords = {biometrics, CNN, diffusion, face recognition, generative models, survey, synthetic data},
pubstate = {published},
tppubtype = {article}
}
Over the past years, deep learning capabilities and the availability of large-scale training datasets advanced rapidly, leading to breakthroughs in face recognition accuracy. However, these technologies are foreseen to face a major challenge in the next years due to the legal and ethical concerns about using authentic biometric data in AI model training and evaluation along with increasingly utilizing data-hungry state-of-the-art deep learning models. With the recent advances in deep generative models and their success in generating realistic and high-resolution synthetic image data, privacy-friendly synthetic data has been recently proposed as an alternative to privacy-sensitive authentic data to overcome the challenges of using authentic data in face recognition development. This work aims at providing a clear and structured picture of the use-cases taxonomy of synthetic face data in face recognition along with the recent emerging advances of face recognition models developed on the bases of synthetic data. We also discuss the challenges facing the use of synthetic data in face recognition development and several future prospects of synthetic data in the domain of face recognition. |
Ivanovska, Marija; Štruc, Vitomir Face Morphing Attack Detection with Denoising Diffusion Probabilistic Models Proceedings Article In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), pp. 1-6, 2023. @inproceedings{IWBF2023_Marija,
title = {Face Morphing Attack Detection with Denoising Diffusion Probabilistic Models},
author = {Marija Ivanovska and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/03/IWBF2023_Morphing.pdf},
year = {2023},
date = {2023-02-28},
booktitle = {Proceedings of the International Workshop on Biometrics and Forensics (IWBF)},
pages = {1-6},
abstract = {Morphed face images have recently become a growing concern for existing face verification systems, as they are relatively easy to generate and can be used to impersonate someone's identity for various malicious purposes. Efficient Morphing Attack Detection (MAD) that generalizes well across different morphing techniques is, therefore, of paramount importance. Existing MAD techniques predominantly rely on discriminative models that learn from examples of bona fide and morphed images and, as a result, often exhibit sub-optimal generalization performance when confronted with unknown types of morphing attacks. To address this problem, we propose a novel, diffusion--based MAD method in this paper that learns only from the characteristics of bona fide images. Various forms of morphing attacks are then detected by our model as out-of-distribution samples. We perform rigorous experiments over four different datasets (CASIA-WebFace, FRLL-Morphs, FERET-Morphs and FRGC-Morphs) and compare the proposed solution to both discriminatively-trained and once-class MAD models. The experimental results show that our MAD model achieves highly competitive results on all considered datasets.},
keywords = {biometrics, deep learning, denoising diffusion probabilistic models, diffusion, face, face morphing attack, morphing attack, morphing attack detection},
pubstate = {published},
tppubtype = {inproceedings}
}
Morphed face images have recently become a growing concern for existing face verification systems, as they are relatively easy to generate and can be used to impersonate someone's identity for various malicious purposes. Efficient Morphing Attack Detection (MAD) that generalizes well across different morphing techniques is, therefore, of paramount importance. Existing MAD techniques predominantly rely on discriminative models that learn from examples of bona fide and morphed images and, as a result, often exhibit sub-optimal generalization performance when confronted with unknown types of morphing attacks. To address this problem, we propose a novel, diffusion--based MAD method in this paper that learns only from the characteristics of bona fide images. Various forms of morphing attacks are then detected by our model as out-of-distribution samples. We perform rigorous experiments over four different datasets (CASIA-WebFace, FRLL-Morphs, FERET-Morphs and FRGC-Morphs) and compare the proposed solution to both discriminatively-trained and once-class MAD models. The experimental results show that our MAD model achieves highly competitive results on all considered datasets. |