2025 |
Tomašević, Darian; Boutros, Fadi; Lin, Chenhao; Damer, Naser; Štruc, Vitomir; Peer, Peter ID-Booth: Identity-consistent Face Generation with Diffusion Models Proceedings Article In: IEEE International Conference on Automatic Face and Gesture Recognition 2025, pp. 1-10, 2025. Abstract | Links | BibTeX | Tags: data synthesis, difussion, face, face images, face recognition, generative AI, generative models, synthetic data @inproceedings{DarianFG2025, Recent advances in generative modeling have enabled the generation of high-quality synthetic data that is applicable in a variety of domains, including face recognition. Here, state-of-the-art generative models typically rely on conditioning and fine-tuning of powerful pretrained diffusion models to facilitate the synthesis of realistic images of a desired identity. Yet, these models often do not consider the identity of subjects during training, leading to poor consistency between generated and intended identities. In contrast, methods that employ identity-based training objectives tend to overfit on various aspects of the identity, and in turn, lower the diversity of images that can be generated. To address these issues, we present in this paper a novel generative diffusion-based framework, called ID-Booth. ID-Booth consists of a denoising network responsible for data generation, a variational auto-encoder for mapping images to and from a lower-dimensional latent space and a text encoder that allows for prompt-based control over the generation procedure. The framework utilizes a novel triplet identity training objective and enables identity-consistent image generation while retaining the synthesis capabilities of pretrained diffusion models. Experiments with a state-of-the-art latent diffusion model and diverse prompts reveal that our method facilitates better intra-identity consistency and inter-identity separability than competing methods, while achieving higher image diversity. In turn, the produced data allows for effective augmentation of small-scale datasets and training of better-performing recognition models in a privacy-preserving manner. The source code for the ID-Booth framework is publicly available at https://github.com/dariant/ID-Booth. |
2023 |
Pernuš, Martin; Štruc, Vitomir; Dobrišek, Simon MaskFaceGAN: High Resolution Face Editing With Masked GAN Latent Code Optimization Journal Article In: IEEE Transactions on Image Processing, 2023, ISSN: 1941-0042. Abstract | Links | BibTeX | Tags: CNN, computer vision, deep learning, face editing, face image processing, GAN, GAN inversion, generative models, StyleGAN @article{MaskFaceGAN, Face editing represents a popular research topic within the computer vision and image processing communities. While significant progress has been made recently in this area, existing solutions: ( i ) are still largely focused on low-resolution images, ( ii ) often generate editing results with visual artefacts, or ( iii ) lack fine-grained control over the editing procedure and alter multiple (entangled) attributes simultaneously, when trying to generate the desired facial semantics. In this paper, we aim to address these issues through a novel editing approach, called MaskFaceGAN that focuses on local attribute editing. The proposed approach is based on an optimization procedure that directly optimizes the latent code of a pre-trained (state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with respect to several constraints that ensure: ( i ) preservation of relevant image content, ( ii ) generation of the targeted facial attributes, and ( iii ) spatially–selective treatment of local image regions. The constraints are enforced with the help of an (differentiable) attribute classifier and face parser that provide the necessary reference information for the optimization procedure. MaskFaceGAN is evaluated in extensive experiments on the FRGC, SiblingsDB-HQf, and XM2VTS datasets and in comparison with several state-of-the-art techniques from the literature. Our experimental results show that the proposed approach is able to edit face images with respect to several local facial attributes with unprecedented image quality and at high-resolutions (1024×1024), while exhibiting considerably less problems with attribute entanglement than competing solutions. The source code is publicly available from: https://github.com/MartinPernus/MaskFaceGAN. |
Boutros, Fadi; Štruc, Vitomir; Fierrez, Julian; Damer, Naser Synthetic data for face recognition: Current state and future prospects Journal Article In: Image and Vision Computing, no. 104688, 2023. Abstract | Links | BibTeX | Tags: biometrics, CNN, diffusion, face recognition, generative models, survey, synthetic data @article{FadiIVCSynthetic, Over the past years, deep learning capabilities and the availability of large-scale training datasets advanced rapidly, leading to breakthroughs in face recognition accuracy. However, these technologies are foreseen to face a major challenge in the next years due to the legal and ethical concerns about using authentic biometric data in AI model training and evaluation along with increasingly utilizing data-hungry state-of-the-art deep learning models. With the recent advances in deep generative models and their success in generating realistic and high-resolution synthetic image data, privacy-friendly synthetic data has been recently proposed as an alternative to privacy-sensitive authentic data to overcome the challenges of using authentic data in face recognition development. This work aims at providing a clear and structured picture of the use-cases taxonomy of synthetic face data in face recognition along with the recent emerging advances of face recognition models developed on the bases of synthetic data. We also discuss the challenges facing the use of synthetic data in face recognition development and several future prospects of synthetic data in the domain of face recognition. |
2022 |
Fele, Benjamin; Lampe, Ajda; Peer, Peter; Štruc, Vitomir C-VTON: Context-Driven Image-Based Virtual Try-On Network Proceedings Article In: IEEE/CVF Winter Applications in Computer Vision (WACV), pp. 1–10, 2022. Abstract | Links | BibTeX | Tags: computer vision, deepbeauty, fashion, generative models, image editing, try-on, virtual try-on @inproceedings{WACV2022_Fele, Image-based virtual try-on techniques have shown great promise for enhancing the user-experience and improving customer satisfaction on fashion-oriented e-commerce platforms. However, existing techniques are currently still limited in the quality of the try-on results they are able to produce from input images of diverse characteristics. In this work, we propose a Context-Driven Virtual Try-On Network (C-VTON) that addresses these limitations and convincingly transfers selected clothing items to the target subjects even under challenging pose configurations and in the presence of self-occlusions. At the core of the C-VTON pipeline are: (i) a geometric matching procedure that efficiently aligns the target clothing with the pose of the person in the input images, and (ii) a powerful image generator that utilizes various types of contextual information when synthesizing the final try-on result. C-VTON is evaluated in rigorous experiments on the VITON and MPV datasets and in comparison to state-of-the-art techniques from the literature. Experimental results show that the proposed approach is able to produce photo-realistic and visually convincing results and significantly improves on the existing state-of-the-art. |