2025
|
Pernus, Martin; Fookes, Clinton; Struc, Vitomir; Dobrisek, Simon FICE: Text-conditioned fashion-image editing with guided GAN inversion Journal Article In: Pattern Recognition, vol. 158, no. 111022, pp. 1-18, 2025. @article{PR_FICE_2024,
title = {FICE: Text-conditioned fashion-image editing with guided GAN inversion},
author = {Martin Pernus and Clinton Fookes and Vitomir Struc and Simon Dobrisek},
url = {https://www.sciencedirect.com/science/article/pii/S0031320324007738
https://lmi.fe.uni-lj.si/wp-content/uploads/2024/09/FICE_main_paper.pdf
https://lmi.fe.uni-lj.si/wp-content/uploads/2024/09/FICE_supplementary.pdf},
doi = {https://doi.org/10.1016/j.patcog.2024.111022},
year = {2025},
date = {2025-02-01},
urldate = {2025-02-01},
journal = {Pattern Recognition},
volume = {158},
number = {111022},
pages = {1-18},
abstract = {Fashion-image editing is a challenging computer-vision task where the goal is to incorporate selected apparel into a given input image. Most existing techniques, known as Virtual Try-On methods, deal with this task by first selecting an example image of the desired apparel and then transferring the clothing onto the target person. Conversely, in this paper, we consider editing fashion images with text descriptions. Such an approach has several advantages over example-based virtual try-on techniques: (i) it does not require an image of the target fashion item, and (ii) it allows the expression of a wide variety of visual concepts through the use of natural language. Existing image-editing methods that work with language inputs are heavily constrained by their requirement for training sets with rich attribute annotations or they are only able to handle simple text descriptions. We address these constraints by proposing a novel text-conditioned editing model called FICE (Fashion Image CLIP Editing) that is capable of handling a wide variety of diverse text descriptions to guide the editing procedure. Specifically, with FICE, we extend the common GAN-inversion process by including semantic, pose-related, and image-level constraints when generating images. We leverage the capabilities of the CLIP model to enforce the text-provided semantics, due to its impressive image–text association capabilities. We furthermore propose a latent-code regularization technique that provides the means to better control the fidelity of the synthesized images. We validate the FICE through rigorous experiments on a combination of VITON images and Fashion-Gen text descriptions and in comparison with several state-of-the-art, text-conditioned, image-editing approaches. Experimental results demonstrate that the FICE generates very realistic fashion images and leads to better editing than existing, competing approaches. The source code is publicly available from:
https://github.com/MartinPernus/FICE},
keywords = {computer vision for fashion, GAN inversion, generative adversarial networks, generative AI, image editing, text conditioning},
pubstate = {published},
tppubtype = {article}
}
Fashion-image editing is a challenging computer-vision task where the goal is to incorporate selected apparel into a given input image. Most existing techniques, known as Virtual Try-On methods, deal with this task by first selecting an example image of the desired apparel and then transferring the clothing onto the target person. Conversely, in this paper, we consider editing fashion images with text descriptions. Such an approach has several advantages over example-based virtual try-on techniques: (i) it does not require an image of the target fashion item, and (ii) it allows the expression of a wide variety of visual concepts through the use of natural language. Existing image-editing methods that work with language inputs are heavily constrained by their requirement for training sets with rich attribute annotations or they are only able to handle simple text descriptions. We address these constraints by proposing a novel text-conditioned editing model called FICE (Fashion Image CLIP Editing) that is capable of handling a wide variety of diverse text descriptions to guide the editing procedure. Specifically, with FICE, we extend the common GAN-inversion process by including semantic, pose-related, and image-level constraints when generating images. We leverage the capabilities of the CLIP model to enforce the text-provided semantics, due to its impressive image–text association capabilities. We furthermore propose a latent-code regularization technique that provides the means to better control the fidelity of the synthesized images. We validate the FICE through rigorous experiments on a combination of VITON images and Fashion-Gen text descriptions and in comparison with several state-of-the-art, text-conditioned, image-editing approaches. Experimental results demonstrate that the FICE generates very realistic fashion images and leads to better editing than existing, competing approaches. The source code is publicly available from:
https://github.com/MartinPernus/FICE |
Vitek, Matej; Štruc, Vitomir; Peer, Peter GazeNet: A lightweight multitask sclera feature extractor Journal Article In: Alexandria Engineering Journal, vol. 112, pp. 661-671, 2025. @article{Vitek2024_Gaze,
title = {GazeNet: A lightweight multitask sclera feature extractor},
author = {Matej Vitek and Vitomir Štruc and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/11/1-s2.0-S1110016824014273-main.pdf
https://www.sciencedirect.com/science/article/pii/S1110016824014273},
doi = {https://doi.org/10.1016/j.aej.2024.11.011},
year = {2025},
date = {2025-01-05},
journal = {Alexandria Engineering Journal},
volume = {112},
pages = {661-671},
abstract = {The sclera is a recently emergent biometric modality with many desirable characteristics. However, most literature solutions for sclera-based recognition rely on sequences of complex deep networks with significant computational overhead. In this paper, we propose a lightweight multitask-based sclera feature extractor. The proposed GazeNet network has a computational complexity below 1 GFLOP, making it appropriate for less capable devices like smartphones and head-mounted displays. Our experiments show that GazeNet (which is based on the SqueezeNet architecture) outperforms both the base SqueezeNet model as well as the more computationally intensive ScleraNET model from the literature. Thus, we demonstrate that our proposed gaze-direction multitask learning procedure, along with careful lightweight architecture selection, leads to computationally efficient networks with high recognition performance.},
keywords = {biometrics, CNN, deep learning, lightweight models, sclera},
pubstate = {published},
tppubtype = {article}
}
The sclera is a recently emergent biometric modality with many desirable characteristics. However, most literature solutions for sclera-based recognition rely on sequences of complex deep networks with significant computational overhead. In this paper, we propose a lightweight multitask-based sclera feature extractor. The proposed GazeNet network has a computational complexity below 1 GFLOP, making it appropriate for less capable devices like smartphones and head-mounted displays. Our experiments show that GazeNet (which is based on the SqueezeNet architecture) outperforms both the base SqueezeNet model as well as the more computationally intensive ScleraNET model from the literature. Thus, we demonstrate that our proposed gaze-direction multitask learning procedure, along with careful lightweight architecture selection, leads to computationally efficient networks with high recognition performance. |
2024
|
Gan, Chenquan; Xiao, Junhao; Zhu, Qingyi; Jain, Deepak Kumar; Struc, Vitomir Transfer-Learning Enabled Micro-Expression Recognition Using Dense Connections and Mixed Attention Journal Article In: Knowledge Based Systems, vol. 305, iss. December 2024, no. 112640, 2024. @article{KBS_2024,
title = {Transfer-Learning Enabled Micro-Expression Recognition Using Dense Connections and Mixed Attention},
author = {Chenquan Gan and Junhao Xiao and Qingyi Zhu and Deepak Kumar Jain and Vitomir Struc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/11/Transfer_Learning_Enabled_compressed.pdf},
doi = {https://doi.org/10.1016/j.knosys.2024.112640},
year = {2024},
date = {2024-12-01},
urldate = {2025-02-01},
journal = {Knowledge Based Systems},
volume = {305},
number = {112640},
issue = {December 2024},
abstract = {Micro-expression recognition (MER) is a challenging computer vision problem, where the limited amount of available training data and insufficient intensity of the facial expressions are among the main issues adversely affecting the performance of existing recognition models. To address these challenges, this paper explores a transfer–learning enabled MER model using a densely connected feature extraction module with mixed attention. Unlike previous works that utilize transfer learning to facilitate MER and extract local facial expression information, our model relies on pretraining with three diverse macro-expression datasets and, as a result, can: (i) overcome the problem of insufficient sample size and limited training data availability, (ii) leverage (related) domain-specific information from multiple datasets with diverse characteristics, and (iii) improve the model adaptability to complex scenes. Furthermore, to enhance the intensity of the micro expressions and improve the discriminability of the extracted features, the Euler video magnification (EVM) method is adopted in the preprocessing stage and then used jointly with a densely connected feature extraction module and a mixed attention mechanism to derive expressive feature representations for the classification procedure. The proposed feature extraction mechanism not only guarantees the integrity of the extracted features but also efficiently captures local texture cues by aggregating the most salient information from the generated feature maps, which is key for the MER task. The experimental results on multiple datasets demonstrate the robustness and effectiveness of our model compared to the state-of-the-art.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Micro-expression recognition (MER) is a challenging computer vision problem, where the limited amount of available training data and insufficient intensity of the facial expressions are among the main issues adversely affecting the performance of existing recognition models. To address these challenges, this paper explores a transfer–learning enabled MER model using a densely connected feature extraction module with mixed attention. Unlike previous works that utilize transfer learning to facilitate MER and extract local facial expression information, our model relies on pretraining with three diverse macro-expression datasets and, as a result, can: (i) overcome the problem of insufficient sample size and limited training data availability, (ii) leverage (related) domain-specific information from multiple datasets with diverse characteristics, and (iii) improve the model adaptability to complex scenes. Furthermore, to enhance the intensity of the micro expressions and improve the discriminability of the extracted features, the Euler video magnification (EVM) method is adopted in the preprocessing stage and then used jointly with a densely connected feature extraction module and a mixed attention mechanism to derive expressive feature representations for the classification procedure. The proposed feature extraction mechanism not only guarantees the integrity of the extracted features but also efficiently captures local texture cues by aggregating the most salient information from the generated feature maps, which is key for the MER task. The experimental results on multiple datasets demonstrate the robustness and effectiveness of our model compared to the state-of-the-art. |
Dragar, Luka; Rot, Peter; Peer, Peter; Štruc, Vitomir; Batagelj, Borut W-TDL: Window-Based Temporal Deepfake Localization Proceedings Article In: Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing (MRAC ’24), Proceedings of the 32nd ACM International Conference on Multimedia (MM’24), ACM, 2024. @inproceedings{MRAC2024,
title = {W-TDL: Window-Based Temporal Deepfake Localization},
author = {Luka Dragar and Peter Rot and Peter Peer and Vitomir Štruc and Borut Batagelj},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/09/ACM_1M_DeepFakes.pdf},
year = {2024},
date = {2024-11-01},
booktitle = {Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing (MRAC ’24), Proceedings of the 32nd ACM International Conference on Multimedia (MM’24)},
publisher = {ACM},
abstract = {The quality of synthetic data has advanced to such a degree of realism that distinguishing it from genuine data samples is increasingly challenging. Deepfake content, including images, videos, and audio, is often used maliciously, necessitating effective detection methods. While numerous competitions have propelled the development of deepfake detectors, a significant gap remains in accurately pinpointing the temporal boundaries of manipulations. Addressing this, we propose an approach for temporal deepfake localization (TDL) utilizing a window-based method for audio (W-TDL) and a complementary visual frame-based model. Our contributions include an effective method for detecting and localizing fake video and audio segments and addressing unbalanced training labels in spoofed audio datasets. Our approach leverages the EVA visual transformer for frame-level analysis and a modified TDL method for audio, achieving competitive results in the 1M-DeepFakes Detection Challenge. Comprehensive experiments on the AV-Deepfake1M dataset demonstrate the effectiveness of our method, providing an effective solution to detect and localize deepfake manipulations.},
keywords = {CNN, deepfake DAD, deepfakes, deeplearning, detection, localization},
pubstate = {published},
tppubtype = {inproceedings}
}
The quality of synthetic data has advanced to such a degree of realism that distinguishing it from genuine data samples is increasingly challenging. Deepfake content, including images, videos, and audio, is often used maliciously, necessitating effective detection methods. While numerous competitions have propelled the development of deepfake detectors, a significant gap remains in accurately pinpointing the temporal boundaries of manipulations. Addressing this, we propose an approach for temporal deepfake localization (TDL) utilizing a window-based method for audio (W-TDL) and a complementary visual frame-based model. Our contributions include an effective method for detecting and localizing fake video and audio segments and addressing unbalanced training labels in spoofed audio datasets. Our approach leverages the EVA visual transformer for frame-level analysis and a modified TDL method for audio, achieving competitive results in the 1M-DeepFakes Detection Challenge. Comprehensive experiments on the AV-Deepfake1M dataset demonstrate the effectiveness of our method, providing an effective solution to detect and localize deepfake manipulations. |
Boutros, Fadi; Štruc, Vitomir; Damer, Naser AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition Proceedings Article In: Proceedings of the European Conference on Computer Vision (ECCV 2024), pp. 1-20, 2024. @inproceedings{FadiECCV2024,
title = {AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition},
author = {Fadi Boutros and Vitomir Štruc and Naser Damer},
url = {https://arxiv.org/pdf/2407.01332},
year = {2024},
date = {2024-09-30},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV 2024)},
pages = {1-20},
abstract = {Knowledge distillation (KD) aims at improving the performance of a compact student model by distilling the knowledge from a high-performing teacher model. In this paper, we present an adaptive KD approach, namely AdaDistill, for deep face recognition. The proposed AdaDistill embeds the KD concept into the softmax loss by training the student using a margin penalty softmax loss with distilled class centers from the teacher. Being aware of the relatively low capacity of the compact student model, we propose to distill less complex knowledge at an early stage of training and more complex one at a later stage of training. This relative adjustment of the distilled knowledge is controlled by the progression of the learning capability of the student over the training iterations without the need to tune any hyper-parameters. Extensive experiments and ablation studies show that AdaDistill can enhance the discriminative learning capability of the student and demonstrate superiority over various state-of-the-art competitors on several challenging benchmarks, such as IJB-B, IJB-C, and ICCV2021-MFR},
keywords = {adaptive distillation, biometrics, CNN, deep learning, face, face recognition, knowledge distillation},
pubstate = {published},
tppubtype = {inproceedings}
}
Knowledge distillation (KD) aims at improving the performance of a compact student model by distilling the knowledge from a high-performing teacher model. In this paper, we present an adaptive KD approach, namely AdaDistill, for deep face recognition. The proposed AdaDistill embeds the KD concept into the softmax loss by training the student using a margin penalty softmax loss with distilled class centers from the teacher. Being aware of the relatively low capacity of the compact student model, we propose to distill less complex knowledge at an early stage of training and more complex one at a later stage of training. This relative adjustment of the distilled knowledge is controlled by the progression of the learning capability of the student over the training iterations without the need to tune any hyper-parameters. Extensive experiments and ablation studies show that AdaDistill can enhance the discriminative learning capability of the student and demonstrate superiority over various state-of-the-art competitors on several challenging benchmarks, such as IJB-B, IJB-C, and ICCV2021-MFR |
Ocvirk, Krištof; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut Primerjava metod za zaznavanje napadov ponovnega zajema Proceedings Article In: Proceedings of ERK, pp. 1-4, Portorož, Slovenia, 2024. @inproceedings{EK_Ocvirk2024,
title = {Primerjava metod za zaznavanje napadov ponovnega zajema},
author = {Krištof Ocvirk and Marko Brodarič and Peter Peer and Vitomir Struc and Borut Batagelj},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/10/ocvirkprimerjava_metod.pdf},
year = {2024},
date = {2024-09-26},
urldate = {2024-09-26},
booktitle = {Proceedings of ERK},
pages = {1-4},
address = {Portorož, Slovenia},
abstract = {The increasing prevalence of digital identity verification has amplified the demand for robust personal document authentication systems. To obscure traces of forgery, forgers often photograph the documents after reprinting or directly capture them from a screen display. This paper is a work report for the First Competition on Presentation Attack Detection on ID Cards, held at the International Joint Conference on Biometrics 2024 (IJCB PAD-ID Card 2024). The competition aims to explore the efficacy of deep neural networks in detecting recapture attacks. The Document Liveness Challenge Dataset (DLC-2021) was utilized to train models. Several models were adapted for this task, including ViT, Xception, TRes-Net, and EVA. Among these, the Xception model achieved the best performance, showing a significantly low error rate in both attack presentation classification error and bona fide presentation classification error.},
keywords = {attacks, biometrics, CNN, deep learning, identity cards, pad},
pubstate = {published},
tppubtype = {inproceedings}
}
The increasing prevalence of digital identity verification has amplified the demand for robust personal document authentication systems. To obscure traces of forgery, forgers often photograph the documents after reprinting or directly capture them from a screen display. This paper is a work report for the First Competition on Presentation Attack Detection on ID Cards, held at the International Joint Conference on Biometrics 2024 (IJCB PAD-ID Card 2024). The competition aims to explore the efficacy of deep neural networks in detecting recapture attacks. The Document Liveness Challenge Dataset (DLC-2021) was utilized to train models. Several models were adapted for this task, including ViT, Xception, TRes-Net, and EVA. Among these, the Xception model achieved the best performance, showing a significantly low error rate in both attack presentation classification error and bona fide presentation classification error. |
Manojlovska, Anastasija; Štruc, Vitomir; Grm, Klemen Interpretacija mehanizmov obraznih biometričnih modelov s kontrastnim multimodalnim učenjem Proceedings Article In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024. @inproceedings{Anastasija_ERK24,
title = {Interpretacija mehanizmov obraznih biometričnih modelov s kontrastnim multimodalnim učenjem},
author = {Anastasija Manojlovska and Vitomir Štruc and Klemen Grm},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/10/ERK2024_Copy.pdf},
year = {2024},
date = {2024-09-26},
booktitle = {Proceedings of ERK 2024},
pages = {1-4},
address = {Portorož, Slovenia},
abstract = {Razložljiva umetna inteligenca (XAI) povečuje transparentnost sistemov umetne inteligence. Ta študija uporablja model CLIP (Contrastive Language-Image Pretraining) podjetja OpenAI za prepoznavanje obraznih atributov v podatkovni zbirki VGGFace2 z uporabo anotacij atributov iz podatkovne zbirke MAADFace. Z poravnavo slik in opisov v naravnem jeziku prepoznamo atribute, kot so starost, spol in pričeska, ter ustvarimo razlage v naravnem jeziku. Raziskujemo tudi integracijo predhodno naučenih modelov za prepoznavanje obrazov in dodajanje razvrščevalnih plasti za izboljšanje razvrščanja atributov. Prednaučeni model CLIP, se je izkazal najboljši pri prepoznavanju atributov Moški in Črn, saj je dosegel vrednosti AUC 0,9891 oz. 0,9829.},
keywords = {CNN, deep learning, face recognition, xai},
pubstate = {published},
tppubtype = {inproceedings}
}
Razložljiva umetna inteligenca (XAI) povečuje transparentnost sistemov umetne inteligence. Ta študija uporablja model CLIP (Contrastive Language-Image Pretraining) podjetja OpenAI za prepoznavanje obraznih atributov v podatkovni zbirki VGGFace2 z uporabo anotacij atributov iz podatkovne zbirke MAADFace. Z poravnavo slik in opisov v naravnem jeziku prepoznamo atribute, kot so starost, spol in pričeska, ter ustvarimo razlage v naravnem jeziku. Raziskujemo tudi integracijo predhodno naučenih modelov za prepoznavanje obrazov in dodajanje razvrščevalnih plasti za izboljšanje razvrščanja atributov. Prednaučeni model CLIP, se je izkazal najboljši pri prepoznavanju atributov Moški in Črn, saj je dosegel vrednosti AUC 0,9891 oz. 0,9829. |
Brodarič, Marko; Peer, Peter; Struc, Vitomir Towards Improving Backbones for Deepfake Detection Proceedings Article In: Proceedings of ERK 2024, pp. 1-4, 2024. @inproceedings{ERK_2024_Deepfakes,
title = {Towards Improving Backbones for Deepfake Detection},
author = {Marko Brodarič and Peter Peer and Vitomir Struc},
year = {2024},
date = {2024-09-25},
booktitle = {Proceedings of ERK 2024},
pages = {1-4},
keywords = {CNN, deep learning, deepfake detection, deepfakes, media forensics, transformer},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Sikošek, Lovro; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut Detection of Presentation Attacks with 3D Masks Using Deep Learning Proceedings Article In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024. @inproceedings{ERK_PAD24,
title = {Detection of Presentation Attacks with 3D Masks Using Deep Learning},
author = {Lovro Sikošek and Marko Brodarič and Peter Peer and Vitomir Struc and Borut Batagelj},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/10/sikosekdetekcija_prezentacijskih.pdf},
year = {2024},
date = {2024-09-25},
booktitle = {Proceedings of ERK 2024},
pages = {1-4},
address = {Portorož, Slovenia},
abstract = {This paper describes a cutting edge approach to Presentation Attack Detection (PAD) of 3D mask attacks using deep learning. We utilize a ResNeXt convolutional neural network, pre-trained on the ImageNet dataset and fine-tuned on the 3D Mask Attack Database (3DMAD). We also evaluate the model on a smaller, more general validation set containing different types of presentation attacks captured with various types of sensors. Experimental data shows that our model achieves high accuracy in distinguishing between genuine faces and mask attacks within the 3DMAD database. However, evaluation on a more general testing set reveals challenges in generalizing to new types of attacks and datasets, suggesting the need for further research to enhance model robustness.},
keywords = {biometrics, CNN, deep learning, face PAD, face recognition, pad},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper describes a cutting edge approach to Presentation Attack Detection (PAD) of 3D mask attacks using deep learning. We utilize a ResNeXt convolutional neural network, pre-trained on the ImageNet dataset and fine-tuned on the 3D Mask Attack Database (3DMAD). We also evaluate the model on a smaller, more general validation set containing different types of presentation attacks captured with various types of sensors. Experimental data shows that our model achieves high accuracy in distinguishing between genuine faces and mask attacks within the 3DMAD database. However, evaluation on a more general testing set reveals challenges in generalizing to new types of attacks and datasets, suggesting the need for further research to enhance model robustness. |
Alessio, Leon; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut Prepoznava zamenjave obraza na slikah osebnih dokumentov Proceedings Article In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024. @inproceedings{SWAP_ERK_24,
title = {Prepoznava zamenjave obraza na slikah osebnih dokumentov},
author = {Leon Alessio and Marko Brodarič and Peter Peer and Vitomir Struc and Borut Batagelj},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/10/alessioprepoznava_zamenjave.pdf},
year = {2024},
date = {2024-09-25},
booktitle = {Proceedings of ERK 2024},
pages = {1-4},
address = {Portorož, Slovenia},
abstract = {In recent years, a need for remote user authentication has emerged. Many authentication techniques are based on verifying an image of identity documents (ID). This approach mitigates the need for physical presence from both parties, making the authentication process quicker and more effective. However, it also presents challenges, such as data security and the risk of identity fraud. Attackers use many techniques to fool authentication algorithms. This paper focuses on detecting face substitution, a common and straightforward fraud technique where the perpetrator replaces the face image on the ID. Due to its simplicity, almost anyone can utilize this technique extensively. Unlike digitally altered images, these modifications are manually detectable but pose challenges for computer algorithms. To face the challenge of detecting such an attack, we extended a dataset containing original images of identity cards of 9 countries with altered images, where the original face was substituted with another face from the dataset. We developed a method to detect such tampering by identifying unusual straight lines that indicate an overlay on the ID. We then evaluated the method on our dataset. While the method showed limited success, it underscores the complexity of this problem and provides a benchmark for future research.},
keywords = {biometrics, deep learning, deep models, face PAD, face recognition, pad},
pubstate = {published},
tppubtype = {inproceedings}
}
In recent years, a need for remote user authentication has emerged. Many authentication techniques are based on verifying an image of identity documents (ID). This approach mitigates the need for physical presence from both parties, making the authentication process quicker and more effective. However, it also presents challenges, such as data security and the risk of identity fraud. Attackers use many techniques to fool authentication algorithms. This paper focuses on detecting face substitution, a common and straightforward fraud technique where the perpetrator replaces the face image on the ID. Due to its simplicity, almost anyone can utilize this technique extensively. Unlike digitally altered images, these modifications are manually detectable but pose challenges for computer algorithms. To face the challenge of detecting such an attack, we extended a dataset containing original images of identity cards of 9 countries with altered images, where the original face was substituted with another face from the dataset. We developed a method to detect such tampering by identifying unusual straight lines that indicate an overlay on the ID. We then evaluated the method on our dataset. While the method showed limited success, it underscores the complexity of this problem and provides a benchmark for future research. |
Plesh, Richard; Križaj, Janez; Bahmani, Keivan; Banavar, Mahesh; Struc, Vitomir; Schuckers, Stephanie Discovering Interpretable Feature Directions in the Embedding Space of Face Recognition Models Proceedings Article In: International Joint Conference on Biometrics (IJCB 2024), pp. 1-10, 2024. @inproceedings{Krizaj,
title = {Discovering Interpretable Feature Directions in the Embedding Space of Face Recognition Models},
author = {Richard Plesh and Janez Križaj and Keivan Bahmani and Mahesh Banavar and Vitomir Struc and Stephanie Schuckers},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/08/107.pdf
https://lmi.fe.uni-lj.si/wp-content/uploads/2024/08/107-supp.pdf},
year = {2024},
date = {2024-09-15},
booktitle = {International Joint Conference on Biometrics (IJCB 2024)},
pages = {1-10},
abstract = {Modern face recognition (FR) models, particularly their convolutional neural network based implementations, often raise concerns regarding privacy and ethics due to their “black-box” nature. To enhance the explainability of FR models and the interpretability of their embedding space, we introduce in this paper three novel techniques for discovering semantically meaningful feature directions (or axes). The first technique uses a dedicated facial-region blending procedure together with principal component analysis to discover embedding space direction that correspond to spatially isolated semantic face areas, providing a new perspective on facial feature interpretation. The other two proposed techniques exploit attribute labels to discern feature directions that correspond to intra-identity variations, such as pose, illumination angle, and expression, but do so either through a cluster analysis or a dedicated regression procedure. To validate the capabilities of the developed techniques, we utilize a powerful template decoder that inverts the image embedding back into the pixel space. Using the decoder, we visualize linear movements along the discovered directions, enabling a clearer understanding of the internal representations within face recognition models. The source code will be made publicly available.},
keywords = {biometrics, CNN, deep learning, face recognition, feature space understanding, xai},
pubstate = {published},
tppubtype = {inproceedings}
}
Modern face recognition (FR) models, particularly their convolutional neural network based implementations, often raise concerns regarding privacy and ethics due to their “black-box” nature. To enhance the explainability of FR models and the interpretability of their embedding space, we introduce in this paper three novel techniques for discovering semantically meaningful feature directions (or axes). The first technique uses a dedicated facial-region blending procedure together with principal component analysis to discover embedding space direction that correspond to spatially isolated semantic face areas, providing a new perspective on facial feature interpretation. The other two proposed techniques exploit attribute labels to discern feature directions that correspond to intra-identity variations, such as pose, illumination angle, and expression, but do so either through a cluster analysis or a dedicated regression procedure. To validate the capabilities of the developed techniques, we utilize a powerful template decoder that inverts the image embedding back into the pixel space. Using the decoder, we visualize linear movements along the discovered directions, enabling a clearer understanding of the internal representations within face recognition models. The source code will be made publicly available. |
DeAndres-Tame, Ivan; Tolosana, Ruben; Melzi, Pietro; Vera-Rodriguez, Ruben; Kim, Minchul; Rathgeb, Christian; Liu, Xiaoming; Morales, Aythami; Fierrez, Julian; Ortega-Garcia, Javier; Zhong, Zhizhou; Huang, Yuge; Mi, Yuxi; Ding, Shouhong; Zhou, Shuigeng; He, Shuai; Fu, Lingzhi; Cong, Heng; Zhang, Rongyu; Xiao, Zhihong; Smirnov, Evgeny; Pimenov, Anton; Grigorev, Aleksei; Timoshenko, Denis; Asfaw, Kaleb Mesfin; Low, Cheng Yaw; Liu, Hao; Wang, Chuyi; Zuo, Qing; He, Zhixiang; Shahreza, Hatef Otroshi; George, Anjith; Unnervik, Alexander; Rahimi, Parsa; Marcel, Sébastien; Neto, Pedro C; Huber, Marco; Kolf, Jan Niklas; Damer, Naser; Boutros, Fadi; Cardoso, Jaime S; Sequeira, Ana F; Atzori, Andrea; Fenu, Gianni; Marras, Mirko; Štruc, Vitomir; Yu, Jiang; Li, Zhangjie; Li, Jichun; Zhao, Weisong; Lei, Zhen; Zhu, Xiangyu; Zhang, Xiao-Yu; Biesseck, Bernardo; Vidal, Pedro; Coelho, Luiz; Granada, Roger; Menotti, David Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data Proceedings Article In: Proceedings of CVPR Workshops (CVPRW 2024), pp. 1-11, 2024. @inproceedings{CVPR_synth2024,
title = {Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data},
author = {Ivan DeAndres-Tame and Ruben Tolosana and Pietro Melzi and Ruben Vera-Rodriguez and Minchul Kim and Christian Rathgeb and Xiaoming Liu and Aythami Morales and Julian Fierrez and Javier Ortega-Garcia and Zhizhou Zhong and Yuge Huang and Yuxi Mi and Shouhong Ding and Shuigeng Zhou and Shuai He and Lingzhi Fu and Heng Cong and Rongyu Zhang and Zhihong Xiao and Evgeny Smirnov and Anton Pimenov and Aleksei Grigorev and Denis Timoshenko and Kaleb Mesfin Asfaw and Cheng Yaw Low and Hao Liu and Chuyi Wang and Qing Zuo and Zhixiang He and Hatef Otroshi Shahreza and Anjith George and Alexander Unnervik and Parsa Rahimi and Sébastien Marcel and Pedro C Neto and Marco Huber and Jan Niklas Kolf and Naser Damer and Fadi Boutros and Jaime S Cardoso and Ana F Sequeira and Andrea Atzori and Gianni Fenu and Mirko Marras and Vitomir Štruc and Jiang Yu and Zhangjie Li and Jichun Li and Weisong Zhao and Zhen Lei and Xiangyu Zhu and Xiao-Yu Zhang and Bernardo Biesseck and Pedro Vidal and Luiz Coelho and Roger Granada and David Menotti},
url = {https://openaccess.thecvf.com/content/CVPR2024W/FRCSyn/papers/Deandres-Tame_Second_Edition_FRCSyn_Challenge_at_CVPR_2024_Face_Recognition_Challenge_CVPRW_2024_paper.pdf},
year = {2024},
date = {2024-06-17},
urldate = {2024-06-17},
booktitle = {Proceedings of CVPR Workshops (CVPRW 2024)},
pages = {1-11},
abstract = {Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intraclass variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at CVPR 2024. FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations, including data privacy concerns, demographic biases, generalization to novel scenarios, and performance constraints in challenging situations such as aging, pose variations, and occlusions. Unlike the 1st edition, in which synthetic data from DCFace and GANDiffFace methods was only allowed to train face recognition systems, in this 2nd edition we propose new subtasks that allow participants to explore novel face generative methods. The outcomes of the 2nd FRCSyn Challenge, along with the proposed experimental protocol and benchmarking contribute significantly to the application of synthetic data to face recognition.},
keywords = {competition, face, face recognition, synthetic data},
pubstate = {published},
tppubtype = {inproceedings}
}
Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intraclass variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at CVPR 2024. FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations, including data privacy concerns, demographic biases, generalization to novel scenarios, and performance constraints in challenging situations such as aging, pose variations, and occlusions. Unlike the 1st edition, in which synthetic data from DCFace and GANDiffFace methods was only allowed to train face recognition systems, in this 2nd edition we propose new subtasks that allow participants to explore novel face generative methods. The outcomes of the 2nd FRCSyn Challenge, along with the proposed experimental protocol and benchmarking contribute significantly to the application of synthetic data to face recognition. |
Rot, Peter; Terhorst, Philipp; Peer, Peter; Štruc, Vitomir ASPECD: Adaptable Soft-Biometric Privacy-Enhancement Using Centroid Decoding for Face Verification Proceedings Article In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 1-9, 2024. @inproceedings{Rot_FG2024,
title = {ASPECD: Adaptable Soft-Biometric Privacy-Enhancement Using Centroid Decoding for Face Verification},
author = {Peter Rot and Philipp Terhorst and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/03/PeterRot_FG2024.pdf},
year = {2024},
date = {2024-05-28},
booktitle = {Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG)},
pages = {1-9},
abstract = {State-of-the-art face recognition models commonly extract information-rich biometric templates from the input images that are then used for comparison purposes and identity inference. While these templates encode identity information in a highly discriminative manner, they typically also capture other potentially sensitive facial attributes, such as age, gender or ethnicity. To address this issue, Soft-Biometric Privacy-Enhancing Techniques (SB-PETs) were proposed in the literature that aim to suppress such attribute information, and, in turn, alleviate the privacy risks associated with the extracted biometric templates. While various SB-PETs were presented so far, existing approaches do not provide dedicated mechanisms to determine which soft-biometrics to exclude and which to retain. In this paper, we address this gap and introduce ASPECD, a modular framework designed to selectively suppress binary and categorical soft-biometrics based on users' privacy preferences. ASPECD consists of multiple sequentially connected components, each dedicated for privacy-enhancement of an individual soft-biometric attribute. The proposed framework suppresses attribute information using a Moment-based Disentanglement process coupled with a centroid decoding procedure, ensuring that the privacy-enhanced templates are directly comparable to the templates in the original embedding space, regardless of the soft-biometric modality being suppressed.
To validate the performance of ASPECD, we conduct experiments on a large-scale face dataset and with five state-of-the-art face recognition models, demonstrating the effectiveness of the proposed approach in suppressing single and multiple soft-biometric attributes. Our approach achieves a competitive privacy-utility trade-off compared to the state-of-the-art methods in scenarios that involve enhancing privacy w.r.t. gender and ethnicity attributes. Source code will be made publicly available.},
keywords = {deepfake, deepfakes, face, face analysis, face deidentification, face image processing, face images, face synthesis, face verification, privacy, privacy enhancement, privacy protection, privacy-enhancing techniques, soft biometric privacy, soft biometrics},
pubstate = {published},
tppubtype = {inproceedings}
}
State-of-the-art face recognition models commonly extract information-rich biometric templates from the input images that are then used for comparison purposes and identity inference. While these templates encode identity information in a highly discriminative manner, they typically also capture other potentially sensitive facial attributes, such as age, gender or ethnicity. To address this issue, Soft-Biometric Privacy-Enhancing Techniques (SB-PETs) were proposed in the literature that aim to suppress such attribute information, and, in turn, alleviate the privacy risks associated with the extracted biometric templates. While various SB-PETs were presented so far, existing approaches do not provide dedicated mechanisms to determine which soft-biometrics to exclude and which to retain. In this paper, we address this gap and introduce ASPECD, a modular framework designed to selectively suppress binary and categorical soft-biometrics based on users' privacy preferences. ASPECD consists of multiple sequentially connected components, each dedicated for privacy-enhancement of an individual soft-biometric attribute. The proposed framework suppresses attribute information using a Moment-based Disentanglement process coupled with a centroid decoding procedure, ensuring that the privacy-enhanced templates are directly comparable to the templates in the original embedding space, regardless of the soft-biometric modality being suppressed.
To validate the performance of ASPECD, we conduct experiments on a large-scale face dataset and with five state-of-the-art face recognition models, demonstrating the effectiveness of the proposed approach in suppressing single and multiple soft-biometric attributes. Our approach achieves a competitive privacy-utility trade-off compared to the state-of-the-art methods in scenarios that involve enhancing privacy w.r.t. gender and ethnicity attributes. Source code will be made publicly available. |
Lampe, Ajda; Stopar, Julija; Jain, Deepak Kumar; Omachi, Shinichiro; Peer, Peter; Struc, Vitomir DiCTI: Diffusion-based Clothing Designer via Text-guided Input Proceedings Article In: Proceedings of the18th International Conference on Automatic Face and Gesture Recognition (FG 2024), pp. 1-9, 2024. @inproceedings{Ajda_Dicti,
title = {DiCTI: Diffusion-based Clothing Designer via Text-guided Input},
author = {Ajda Lampe and Julija Stopar and Deepak Kumar Jain and Shinichiro Omachi and Peter Peer and Vitomir Struc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/06/Dicti_FG2024_compressed.pdf},
year = {2024},
date = {2024-05-27},
booktitle = {Proceedings of the18th International Conference on Automatic Face and Gesture Recognition (FG 2024)},
pages = {1-9},
abstract = {Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on applications, there has been relatively less focus on facilitating fast prototyping for designers and customers seeking to order new designs. To address this gap, we introduce DiCTI (Diffusion-based Clothing Designer via Text-guided Input), a straightforward yet highly effective approach that allows designers to quickly visualize fashion-related ideas using text inputs only.
Given an image of a person and a description of the desired garments as input, DiCTI automatically generates multiple high-resolution, photorealistic images that capture the expressed semantics.
By leveraging a powerful diffusion-based inpainting model conditioned on text inputs, DiCTI is able to synthesize convincing, high-quality images with varied clothing designs that viably follow the provided text descriptions, while being able to process very diverse and challenging inputs, captured in completely unconstrained settings. We evaluate DiCTI in comprehensive experiments on two different datasets (VITON-HD and Fashionpedia) and in comparison to the state-of-the-art (SoTa). The results of our experiments show that DiCTI convincingly outperforms the SoTA competitor in generating higher quality images with more elaborate garments and superior text prompt adherence, both according to standard quantitative evaluation measures and human ratings, generated as part of a user study. The source code of DiCTI will be made publicly available.},
keywords = {clothing design, deepbeauty, denoising diffusion probabilistic models, diffusion, diffusion models, fashion, virtual try-on},
pubstate = {published},
tppubtype = {inproceedings}
}
Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on applications, there has been relatively less focus on facilitating fast prototyping for designers and customers seeking to order new designs. To address this gap, we introduce DiCTI (Diffusion-based Clothing Designer via Text-guided Input), a straightforward yet highly effective approach that allows designers to quickly visualize fashion-related ideas using text inputs only.
Given an image of a person and a description of the desired garments as input, DiCTI automatically generates multiple high-resolution, photorealistic images that capture the expressed semantics.
By leveraging a powerful diffusion-based inpainting model conditioned on text inputs, DiCTI is able to synthesize convincing, high-quality images with varied clothing designs that viably follow the provided text descriptions, while being able to process very diverse and challenging inputs, captured in completely unconstrained settings. We evaluate DiCTI in comprehensive experiments on two different datasets (VITON-HD and Fashionpedia) and in comparison to the state-of-the-art (SoTa). The results of our experiments show that DiCTI convincingly outperforms the SoTA competitor in generating higher quality images with more elaborate garments and superior text prompt adherence, both according to standard quantitative evaluation measures and human ratings, generated as part of a user study. The source code of DiCTI will be made publicly available. |
Tomašević, Darian; Boutros, Fadi; Damer, Naser; Peer, Peter; Štruc, Vitomir Generating bimodal privacy-preserving data for face recognition Journal Article In: Engineering Applications of Artificial Intelligence, vol. 133, iss. E, pp. 1-25, 2024. @article{Darian2024,
title = {Generating bimodal privacy-preserving data for face recognition},
author = {Darian Tomašević and Fadi Boutros and Naser Damer and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/05/PapersDarian.pdf},
doi = {https://doi.org/10.1016/j.engappai.2024.108495},
year = {2024},
date = {2024-05-01},
journal = {Engineering Applications of Artificial Intelligence},
volume = {133},
issue = {E},
pages = {1-25},
abstract = {The performance of state-of-the-art face recognition systems depends crucially on the availability of large-scale training datasets. However, increasing privacy concerns nowadays accompany the collection and distribution of biometric data, which has already resulted in the retraction of valuable face recognition datasets. The use of synthetic data represents a potential solution, however, the generation of privacy-preserving facial images useful for training recognition models is still an open problem. Generative methods also remain bound to the visible spectrum, despite the benefits that multispectral data can provide. To address these issues, we present a novel identity-conditioned generative framework capable of producing large-scale recognition datasets of visible and near-infrared privacy-preserving face images. The framework relies on a novel identity-conditioned dual-branch style-based generative adversarial network to enable the synthesis of aligned high-quality samples of identities determined by features of a pretrained recognition model. In addition, the framework incorporates a novel filter to prevent samples of privacy-breaching identities from reaching the generated datasets and improve both identity separability and intra-identity diversity. Extensive experiments on six publicly available datasets reveal that our framework achieves competitive synthesis capabilities while preserving the privacy of real-world subjects. The synthesized datasets also facilitate training more powerful recognition models than datasets generated by competing methods or even small-scale real-world datasets. Employing both visible and near-infrared data for training also results in higher recognition accuracy on real-world visible spectrum benchmarks. Therefore, training with multispectral data could potentially improve existing recognition systems that utilize only the visible spectrum, without the need for additional sensors.},
keywords = {CNN, face, face generation, face images, face recognition, generative AI, StyleGAN2, synthetic data},
pubstate = {published},
tppubtype = {article}
}
The performance of state-of-the-art face recognition systems depends crucially on the availability of large-scale training datasets. However, increasing privacy concerns nowadays accompany the collection and distribution of biometric data, which has already resulted in the retraction of valuable face recognition datasets. The use of synthetic data represents a potential solution, however, the generation of privacy-preserving facial images useful for training recognition models is still an open problem. Generative methods also remain bound to the visible spectrum, despite the benefits that multispectral data can provide. To address these issues, we present a novel identity-conditioned generative framework capable of producing large-scale recognition datasets of visible and near-infrared privacy-preserving face images. The framework relies on a novel identity-conditioned dual-branch style-based generative adversarial network to enable the synthesis of aligned high-quality samples of identities determined by features of a pretrained recognition model. In addition, the framework incorporates a novel filter to prevent samples of privacy-breaching identities from reaching the generated datasets and improve both identity separability and intra-identity diversity. Extensive experiments on six publicly available datasets reveal that our framework achieves competitive synthesis capabilities while preserving the privacy of real-world subjects. The synthesized datasets also facilitate training more powerful recognition models than datasets generated by competing methods or even small-scale real-world datasets. Employing both visible and near-infrared data for training also results in higher recognition accuracy on real-world visible spectrum benchmarks. Therefore, training with multispectral data could potentially improve existing recognition systems that utilize only the visible spectrum, without the need for additional sensors. |
Tomašević, Darian; Peer, Peter; Štruc, Vitomir BiFaceGAN: Bimodal Face Image Synthesis Book Section In: Bourlai, T. (Ed.): Face Recognition Across the Imaging Spectrum, pp. 273–311, Springer, Singapore, 2024, ISBN: 978-981-97-2058-3. @incollection{Darian2024Book,
title = {BiFaceGAN: Bimodal Face Image Synthesis},
author = {Darian Tomašević and Peter Peer and Vitomir Štruc},
editor = {T. Bourlai},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/11/BiFaceGAN.pdf},
doi = {https://doi.org/10.1007/978-981-97-2059-0_11},
isbn = {978-981-97-2058-3},
year = {2024},
date = {2024-05-01},
urldate = {2024-05-01},
booktitle = {Face Recognition Across the Imaging Spectrum},
pages = {273–311},
publisher = {Springer, Singapore},
abstract = {Modern face recognition and segmentation systems, such as all deep learning approaches, rely on large-scale annotated datasets to achieve competitive performance. However, gathering biometric data often raises privacy concerns and presents a labor-intensive and time-consuming task. Researchers are currently also exploring the use of multispectral data to improve existing solutions, limited to the visible spectrum. Unfortunately, the collection of suitable data is even more difficult, especially if aligned images are required. To address the outlined issues, we present a novel synthesis framework, named BiFaceGAN, capable of producing privacy-preserving large-scale synthetic datasets of photorealistic face images, in the visible and the near-infrared spectrum, along with corresponding ground-truth pixel-level annotations. The proposed framework leverages an innovative Dual-Branch Style-based generative adversarial network (DB-StyleGAN2) to generate per-pixel-aligned bimodal images, followed by an ArcFace Privacy Filter (APF) that ensures the removal of privacy-breaching images. Furthermore, we also implement a Semantic Mask Generator (SMG) that produces reference ground-truth segmentation masks of the synthetic data, based on the latent representations inside the synthesis model and only a handful of manually labeled examples. We evaluate the quality of generated images and annotations through a series of experiments and analyze the benefits of generating bimodal data with a single network. We also show that privacy-preserving data filtering does not notably degrade the image quality of produced datasets. Finally, we demonstrate that the generated data can be employed to train highly successful deep segmentation models, which can generalize well to other real-world datasets.},
keywords = {CNN, deep learning, face synthesis, generative AI, stlyegan},
pubstate = {published},
tppubtype = {incollection}
}
Modern face recognition and segmentation systems, such as all deep learning approaches, rely on large-scale annotated datasets to achieve competitive performance. However, gathering biometric data often raises privacy concerns and presents a labor-intensive and time-consuming task. Researchers are currently also exploring the use of multispectral data to improve existing solutions, limited to the visible spectrum. Unfortunately, the collection of suitable data is even more difficult, especially if aligned images are required. To address the outlined issues, we present a novel synthesis framework, named BiFaceGAN, capable of producing privacy-preserving large-scale synthetic datasets of photorealistic face images, in the visible and the near-infrared spectrum, along with corresponding ground-truth pixel-level annotations. The proposed framework leverages an innovative Dual-Branch Style-based generative adversarial network (DB-StyleGAN2) to generate per-pixel-aligned bimodal images, followed by an ArcFace Privacy Filter (APF) that ensures the removal of privacy-breaching images. Furthermore, we also implement a Semantic Mask Generator (SMG) that produces reference ground-truth segmentation masks of the synthetic data, based on the latent representations inside the synthesis model and only a handful of manually labeled examples. We evaluate the quality of generated images and annotations through a series of experiments and analyze the benefits of generating bimodal data with a single network. We also show that privacy-preserving data filtering does not notably degrade the image quality of produced datasets. Finally, we demonstrate that the generated data can be employed to train highly successful deep segmentation models, which can generalize well to other real-world datasets. |
Babnik, Žiga; Boutros, Fadi; Damer, Naser; Peer, Peter; Štruc, Vitomir AI-KD: Towards Alignment Invariant Face Image Quality Assessment Using Knowledge Distillation Proceedings Article In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), pp. 1-6, 2024. @inproceedings{Babnik_IWBF2024,
title = {AI-KD: Towards Alignment Invariant Face Image Quality Assessment Using Knowledge Distillation},
author = {Žiga Babnik and Fadi Boutros and Naser Damer and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/03/iwbf2024_fiq.pdf},
year = {2024},
date = {2024-04-10},
urldate = {2024-04-10},
booktitle = {Proceedings of the International Workshop on Biometrics and Forensics (IWBF)},
pages = {1-6},
abstract = {Face Image Quality Assessment (FIQA) techniques have seen steady improvements over recent years, but their performance still deteriorates if the input face samples are not properly aligned. This alignment sensitivity comes from the fact that most FIQA techniques are trained or designed using a specific face alignment procedure. If the alignment technique changes, the performance of most existing FIQA techniques quickly becomes suboptimal. To address this problem, we present in this paper a novel knowledge distillation approach, termed AI-KD that can extend on any existing FIQA technique, improving its robustness to alignment variations and, in turn, performance with different alignment procedures. To validate the proposed distillation approach, we conduct comprehensive experiments on 6 face datasets with 4 recent face recognition models and in comparison to 7 state-of-the-art FIQA techniques. Our results show that AI-KD consistently improves performance of the initial FIQA techniques not only with misaligned samples, but also with properly aligned facial images. Furthermore, it leads to a new state-of-the-art, when used with a competitive initial FIQA approach. The code for AI-KD is made publicly available from: https://github.com/LSIbabnikz/AI-KD.},
keywords = {ai, CNN, deep learning, face, face image quality assessment, face image quality estimation, face images, face recognition, face verification},
pubstate = {published},
tppubtype = {inproceedings}
}
Face Image Quality Assessment (FIQA) techniques have seen steady improvements over recent years, but their performance still deteriorates if the input face samples are not properly aligned. This alignment sensitivity comes from the fact that most FIQA techniques are trained or designed using a specific face alignment procedure. If the alignment technique changes, the performance of most existing FIQA techniques quickly becomes suboptimal. To address this problem, we present in this paper a novel knowledge distillation approach, termed AI-KD that can extend on any existing FIQA technique, improving its robustness to alignment variations and, in turn, performance with different alignment procedures. To validate the proposed distillation approach, we conduct comprehensive experiments on 6 face datasets with 4 recent face recognition models and in comparison to 7 state-of-the-art FIQA techniques. Our results show that AI-KD consistently improves performance of the initial FIQA techniques not only with misaligned samples, but also with properly aligned facial images. Furthermore, it leads to a new state-of-the-art, when used with a competitive initial FIQA approach. The code for AI-KD is made publicly available from: https://github.com/LSIbabnikz/AI-KD. |
Rot, Peter; Križaj, Janez; Peer, Peter; Štruc, Vitomir Enhancing Gender Privacy with Photo-realistic Fusion of Disentangled Spatial Segments Proceedings Article In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5, 2024. @inproceedings{RotICASSP24,
title = {Enhancing Gender Privacy with Photo-realistic Fusion of Disentangled Spatial Segments},
author = {Peter Rot and Janez Križaj and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/08/ICASSP_2024___Gender_privacy.pdf},
year = {2024},
date = {2024-04-02},
urldate = {2024-04-02},
booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages = {1-5},
keywords = {deep learning, face, privacy, privacy enhancement, privacy protection, privacy-enhancing techniques, soft biometric privacy},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Babnik, Žiga; Peer, Peter; Štruc, Vitomir eDifFIQA: Towards Efficient Face Image Quality Assessment based on Denoising Diffusion Probabilistic Models Journal Article In: IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM), pp. 1-16, 2024, ISSN: 2637-6407. @article{BabnikTBIOM2024,
title = {eDifFIQA: Towards Efficient Face Image Quality Assessment based on Denoising Diffusion Probabilistic Models},
author = {Žiga Babnik and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/03/TBIOM___DifFIQAv2.pdf
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10468647&tag=1},
doi = {10.1109/TBIOM.2024.3376236},
issn = {2637-6407},
year = {2024},
date = {2024-03-07},
urldate = {2024-03-07},
journal = {IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM)},
pages = {1-16},
abstract = {State-of-the-art Face Recognition (FR) models perform well in constrained scenarios, but frequently fail in difficult real-world scenarios, when no quality guarantees can be made for face samples. For this reason, Face Image Quality Assessment (FIQA) techniques are often used by FR systems, to provide quality estimates of captured face samples. The quality estimate provided by FIQA techniques can be used by the FR system to reject samples of low-quality, in turn improving the performance of the system and reducing the number of critical false-match errors. However, despite steady improvements, ensuring a good trade-off between the performance and computational complexity of FIQA methods across diverse face samples remains challenging. In this paper, we present DifFIQA, a powerful unsupervised approach for quality assessment based on the popular denoising diffusion probabilistic models (DDPMs) and the extended (eDifFIQA) approach. The main idea of the base DifFIQA approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because of the iterative nature of DDPMs the base DifFIQA approach is extremely computationally expensive. Using eDifFIQA we are able to improve on both the performance and computational complexity of the base DifFIQA approach, by employing label optimized knowledge distillation. In this process, quality information inferred by DifFIQA is distilled into a quality-regression model. During the distillation process, we use an additional source of quality information hidden in the relative position of the embedding to further improve the predictive capabilities of the underlying regression model. By choosing different feature extraction backbone models as the basis for the quality-regression eDifFIQA model, we are able to control the trade-off between the predictive capabilities and computational complexity of the final model. We evaluate three eDifFIQA variants of varying sizes in comprehensive experiments on 7 diverse datasets containing static-images and a separate video-based dataset, with 4 target CNN-based FR models and 2 target Transformer-based FR models and against 10 state-of-the-art FIQA techniques, as well as against the initial DifFIQA baseline and a simple regression-based predictor DifFIQA(R), distilled from DifFIQA without any additional optimization. The results show that the proposed label optimized knowledge distillation improves on the performance and computationally complexity of the base DifFIQA approach, and is able to achieve state-of-the-art performance in several distinct experimental scenarios. Furthermore, we also show that the distilled model can be used directly for face recognition and leads to highly competitive results.},
keywords = {biometrics, CNN, deep learning, DifFIQA, difussion, face, face image quality assesment, face recognition, FIQA},
pubstate = {published},
tppubtype = {article}
}
State-of-the-art Face Recognition (FR) models perform well in constrained scenarios, but frequently fail in difficult real-world scenarios, when no quality guarantees can be made for face samples. For this reason, Face Image Quality Assessment (FIQA) techniques are often used by FR systems, to provide quality estimates of captured face samples. The quality estimate provided by FIQA techniques can be used by the FR system to reject samples of low-quality, in turn improving the performance of the system and reducing the number of critical false-match errors. However, despite steady improvements, ensuring a good trade-off between the performance and computational complexity of FIQA methods across diverse face samples remains challenging. In this paper, we present DifFIQA, a powerful unsupervised approach for quality assessment based on the popular denoising diffusion probabilistic models (DDPMs) and the extended (eDifFIQA) approach. The main idea of the base DifFIQA approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because of the iterative nature of DDPMs the base DifFIQA approach is extremely computationally expensive. Using eDifFIQA we are able to improve on both the performance and computational complexity of the base DifFIQA approach, by employing label optimized knowledge distillation. In this process, quality information inferred by DifFIQA is distilled into a quality-regression model. During the distillation process, we use an additional source of quality information hidden in the relative position of the embedding to further improve the predictive capabilities of the underlying regression model. By choosing different feature extraction backbone models as the basis for the quality-regression eDifFIQA model, we are able to control the trade-off between the predictive capabilities and computational complexity of the final model. We evaluate three eDifFIQA variants of varying sizes in comprehensive experiments on 7 diverse datasets containing static-images and a separate video-based dataset, with 4 target CNN-based FR models and 2 target Transformer-based FR models and against 10 state-of-the-art FIQA techniques, as well as against the initial DifFIQA baseline and a simple regression-based predictor DifFIQA(R), distilled from DifFIQA without any additional optimization. The results show that the proposed label optimized knowledge distillation improves on the performance and computationally complexity of the base DifFIQA approach, and is able to achieve state-of-the-art performance in several distinct experimental scenarios. Furthermore, we also show that the distilled model can be used directly for face recognition and leads to highly competitive results. |
Fang, Meiling; Yang, Wufei; Kuijper, Arjan; S̆truc, Vitomir; Damer, Naser Fairness in Face Presentation Attack Detection Journal Article In: Pattern Recognition, vol. 147 , iss. 110002, pp. 1-14, 2024. @article{PR_Fairness2024,
title = {Fairness in Face Presentation Attack Detection},
author = {Meiling Fang and Wufei Yang and Arjan Kuijper and Vitomir S̆truc and Naser Damer},
url = {https://www.sciencedirect.com/science/article/pii/S0031320323007008?dgcid=coauthor},
year = {2024},
date = {2024-03-01},
urldate = {2024-03-01},
journal = {Pattern Recognition},
volume = {147 },
issue = {110002},
pages = {1-14},
abstract = {Face recognition (FR) algorithms have been proven to exhibit discriminatory behaviors against certain demographic and non-demographic groups, raising ethical and legal concerns regarding their deployment in real-world scenarios. Despite the growing number of fairness studies in FR, the fairness of face presentation attack detection (PAD) has been overlooked, mainly due to the lack of appropriately annotated data. To avoid and mitigate the potential negative impact of such behavior, it is essential to assess the fairness in face PAD and develop fair PAD models. To enable fairness analysis in face PAD, we present a Combined Attribute Annotated PAD Dataset (CAAD-PAD), offering seven human-annotated attribute labels. Then, we comprehensively analyze the fairness of PAD and its relation to the nature of the training data and the Operational Decision Threshold Assignment (ODTA) through a set of face PAD solutions. Additionally, we propose a novel metric, the Accuracy Balanced Fairness (ABF), that jointly represents both the PAD fairness and the absolute PAD performance. The experimental results pointed out that female and faces with occluding features (e.g. eyeglasses, beard, etc.) are relatively less protected than male and non-occlusion groups by all PAD solutions. To alleviate this observed unfairness, we propose a plug-and-play data augmentation method, FairSWAP, to disrupt the identity/semantic information and encourage models to mine the attack clues. The extensive experimental results indicate that FairSWAP leads to better-performing and fairer face PADs in 10 out of 12 investigated cases.},
keywords = {biometrics, computer vision, face analysis, face PAD, face recognition, fairness, pad, presentation attack detection},
pubstate = {published},
tppubtype = {article}
}
Face recognition (FR) algorithms have been proven to exhibit discriminatory behaviors against certain demographic and non-demographic groups, raising ethical and legal concerns regarding their deployment in real-world scenarios. Despite the growing number of fairness studies in FR, the fairness of face presentation attack detection (PAD) has been overlooked, mainly due to the lack of appropriately annotated data. To avoid and mitigate the potential negative impact of such behavior, it is essential to assess the fairness in face PAD and develop fair PAD models. To enable fairness analysis in face PAD, we present a Combined Attribute Annotated PAD Dataset (CAAD-PAD), offering seven human-annotated attribute labels. Then, we comprehensively analyze the fairness of PAD and its relation to the nature of the training data and the Operational Decision Threshold Assignment (ODTA) through a set of face PAD solutions. Additionally, we propose a novel metric, the Accuracy Balanced Fairness (ABF), that jointly represents both the PAD fairness and the absolute PAD performance. The experimental results pointed out that female and faces with occluding features (e.g. eyeglasses, beard, etc.) are relatively less protected than male and non-occlusion groups by all PAD solutions. To alleviate this observed unfairness, we propose a plug-and-play data augmentation method, FairSWAP, to disrupt the identity/semantic information and encourage models to mine the attack clues. The extensive experimental results indicate that FairSWAP leads to better-performing and fairer face PADs in 10 out of 12 investigated cases. |
Ivanovska, Marija; Štruc, Vitomir Y-GAN: Learning Dual Data Representations for Anomaly Detection in Images Journal Article In: Expert Systems with Applications (ESWA), vol. 248, no. 123410, pp. 1-7, 2024. @article{ESWA2024,
title = {Y-GAN: Learning Dual Data Representations for Anomaly Detection in Images},
author = {Marija Ivanovska and Vitomir Štruc},
url = {https://www.sciencedirect.com/science/article/pii/S0957417424002756
https://lmi.fe.uni-lj.si/wp-content/uploads/2024/02/YGAN_Marija.pdf},
doi = {https://doi.org/10.1016/j.eswa.2024.123410},
year = {2024},
date = {2024-03-01},
urldate = {2024-03-01},
journal = {Expert Systems with Applications (ESWA)},
volume = {248},
number = {123410},
pages = {1-7},
abstract = {We propose a novel reconstruction-based model for anomaly detection in image data, called 'Y-GAN'. The model consists of a Y-shaped auto-encoder and represents images in two separate latent spaces. The first captures meaningful image semantics, which are key for representing (normal) training data, whereas the second encodes low-level residual image characteristics. To ensure the dual representations encode mutually exclusive information, a disentanglement procedure is designed around a latent (proxy) classifier. Additionally, a novel representation-consistency mechanism is proposed to prevent information leakage between the latent spaces. The model is trained in a one-class learning setting using only normal training data. Due to the separation of semantically-relevant and residual information, Y-GAN is able to derive informative data representations that allow for efficacious anomaly detection across a diverse set of anomaly detection tasks. The model is evaluated in comprehensive experiments with several recent anomaly detection models using four popular image datasets, i.e., MNIST, FMNIST, CIFAR10, and PlantVillage. Experimental results show that Y-GAN outperforms all tested models by a considerable margin and yields state-of-the-art results. The source code for the model is made publicly available at https://github.com/MIvanovska/Y-GAN. },
keywords = {anomaly detection, CNN, deep learning, one-class learning, y-gan},
pubstate = {published},
tppubtype = {article}
}
We propose a novel reconstruction-based model for anomaly detection in image data, called 'Y-GAN'. The model consists of a Y-shaped auto-encoder and represents images in two separate latent spaces. The first captures meaningful image semantics, which are key for representing (normal) training data, whereas the second encodes low-level residual image characteristics. To ensure the dual representations encode mutually exclusive information, a disentanglement procedure is designed around a latent (proxy) classifier. Additionally, a novel representation-consistency mechanism is proposed to prevent information leakage between the latent spaces. The model is trained in a one-class learning setting using only normal training data. Due to the separation of semantically-relevant and residual information, Y-GAN is able to derive informative data representations that allow for efficacious anomaly detection across a diverse set of anomaly detection tasks. The model is evaluated in comprehensive experiments with several recent anomaly detection models using four popular image datasets, i.e., MNIST, FMNIST, CIFAR10, and PlantVillage. Experimental results show that Y-GAN outperforms all tested models by a considerable margin and yields state-of-the-art results. The source code for the model is made publicly available at https://github.com/MIvanovska/Y-GAN. |
Gan, Chenquan; Zheng, Jiahao; Zhu, Qingyi; Jain, Deepak Kumar; Vitomir vStruc, A graph neural network with context filtering and feature correction for conversational emotion recognition Journal Article In: Information Sciences, vol. 658, no. 120017, pp. 1-21, 2024. @article{InformSciences2024,
title = {A graph neural network with context filtering and feature correction for conversational emotion recognition},
author = {Chenquan Gan and Jiahao Zheng and Qingyi Zhu and Deepak Kumar Jain and Vitomir {v{S}}truc, },
url = {https://www.sciencedirect.com/science/article/pii/S002002552301602X?via%3Dihub
https://lmi.fe.uni-lj.si/wp-content/uploads/2023/12/InformationSciences.pdf},
doi = {https://doi.org/10.1016/j.ins.2023.120017},
year = {2024},
date = {2024-02-01},
journal = {Information Sciences},
volume = {658},
number = {120017},
pages = {1-21},
abstract = {Conversational emotion recognition represents an important machine-learning problem with a wide variety of deployment possibilities. The key challenge in this area is how to properly capture the key conversational aspects that facilitate reliable emotion recognition, including utterance semantics, temporal order, informative contextual cues, speaker interactions as well as other relevant factors. In this paper, we present a novel Graph Neural Network approach for conversational emotion recognition at the utterance level. Our method addresses the outlined challenges and represents conversations in the form of graph structures that naturally encode temporal order, speaker dependencies, and even long-distance context. To efficiently capture the semantic content of the conversations, we leverage the zero-shot feature-extraction capabilities of pre-trained large-scale language models and then integrate two key contributions into the graph neural network to ensure competitive recognition results. The first is a novel context filter that establishes meaningful utterance dependencies for the graph construction procedure and removes low-relevance and uninformative utterances from being used as a source of contextual information for the recognition task. The second contribution is a feature-correction procedure that adjusts the information content in the generated feature representations through a gating mechanism to improve their discriminative power and reduce emotion-prediction errors. We conduct extensive experiments on four commonly used conversational datasets, i.e., IEMOCAP, MELD, Dailydialog, and EmoryNLP, to demonstrate the capabilities of the developed graph neural network with context filtering and error-correction capabilities. The results of the experiments point to highly promising performance, especially when compared to state-of-the-art competitors from the literature.},
keywords = {context filtering, conversations, dialogue, emotion recognition, graph neural network, sentiment analysis},
pubstate = {published},
tppubtype = {article}
}
Conversational emotion recognition represents an important machine-learning problem with a wide variety of deployment possibilities. The key challenge in this area is how to properly capture the key conversational aspects that facilitate reliable emotion recognition, including utterance semantics, temporal order, informative contextual cues, speaker interactions as well as other relevant factors. In this paper, we present a novel Graph Neural Network approach for conversational emotion recognition at the utterance level. Our method addresses the outlined challenges and represents conversations in the form of graph structures that naturally encode temporal order, speaker dependencies, and even long-distance context. To efficiently capture the semantic content of the conversations, we leverage the zero-shot feature-extraction capabilities of pre-trained large-scale language models and then integrate two key contributions into the graph neural network to ensure competitive recognition results. The first is a novel context filter that establishes meaningful utterance dependencies for the graph construction procedure and removes low-relevance and uninformative utterances from being used as a source of contextual information for the recognition task. The second contribution is a feature-correction procedure that adjusts the information content in the generated feature representations through a gating mechanism to improve their discriminative power and reduce emotion-prediction errors. We conduct extensive experiments on four commonly used conversational datasets, i.e., IEMOCAP, MELD, Dailydialog, and EmoryNLP, to demonstrate the capabilities of the developed graph neural network with context filtering and error-correction capabilities. The results of the experiments point to highly promising performance, especially when compared to state-of-the-art competitors from the literature. |
Brodarič, Marko; Peer, Peter; Štruc, Vitomir Cross-Dataset Deepfake Detection: Evaluating the Generalization Capabilities of Modern DeepFake Detectors Proceedings Article In: Proceedings of the 27th Computer Vision Winter Workshop (CVWW), pp. 1-10, 2024. @inproceedings{MarkoCVWW,
title = {Cross-Dataset Deepfake Detection: Evaluating the Generalization Capabilities of Modern DeepFake Detectors},
author = {Marko Brodarič and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/01/MarkoCVWW24_compressed.pdf},
year = {2024},
date = {2024-01-31},
booktitle = {Proceedings of the 27th Computer Vision Winter Workshop (CVWW)},
pages = {1-10},
abstract = {Due to the recent advances in generative deep learning, numerous techniques have been proposed in the literature that allow for the creation of so-called deepfakes, i.e., forged facial images commonly used for malicious purposes. These developments have triggered a need for effective deepfake detectors, capable of identifying forged and manipulated imagery as robustly as possible. While a considerable number of detection techniques has been proposed over the years, generalization across a wide spectrum of deepfake-generation techniques still remains an open problem. In this paper, we study a representative set of deepfake generation methods and analyze their performance in a cross-dataset setting with the goal of better understanding the reasons behind the observed generalization performance. To this end, we conduct a comprehensive analysis on the FaceForensics++ dataset and adopt Gradient-weighted Class Activation Mappings (Grad-CAM) to provide insights into the behavior of the evaluated detectors. Since a new class of deepfake generation techniques based on diffusion models recently appeared in the literature, we introduce a new subset of the FaceForensics++ dataset with diffusion-based deepfake and include it in our analysis. The results of our experiments show that most detectors overfit to the specific image artifacts induced by a given deepfake-generation model and mostly focus on local image areas where such artifacts can be expected. Conversely, good generalization appears to be correlated with class activations that cover a broad spatial area and hence capture different image artifacts that appear in various part of the facial region.},
keywords = {data integrity, deepfake, deepfake detection, deepfakes, difussion, face, faceforensics++, media forensics},
pubstate = {published},
tppubtype = {inproceedings}
}
Due to the recent advances in generative deep learning, numerous techniques have been proposed in the literature that allow for the creation of so-called deepfakes, i.e., forged facial images commonly used for malicious purposes. These developments have triggered a need for effective deepfake detectors, capable of identifying forged and manipulated imagery as robustly as possible. While a considerable number of detection techniques has been proposed over the years, generalization across a wide spectrum of deepfake-generation techniques still remains an open problem. In this paper, we study a representative set of deepfake generation methods and analyze their performance in a cross-dataset setting with the goal of better understanding the reasons behind the observed generalization performance. To this end, we conduct a comprehensive analysis on the FaceForensics++ dataset and adopt Gradient-weighted Class Activation Mappings (Grad-CAM) to provide insights into the behavior of the evaluated detectors. Since a new class of deepfake generation techniques based on diffusion models recently appeared in the literature, we introduce a new subset of the FaceForensics++ dataset with diffusion-based deepfake and include it in our analysis. The results of our experiments show that most detectors overfit to the specific image artifacts induced by a given deepfake-generation model and mostly focus on local image areas where such artifacts can be expected. Conversely, good generalization appears to be correlated with class activations that cover a broad spatial area and hence capture different image artifacts that appear in various part of the facial region. |
Križaj, Janez; Plesh, Richard O.; Banavar, Mahesh; Schuckers, Stephanie; Štruc, Vitomir Deep Face Decoder: Towards understanding the embedding space of convolutional networks through visual reconstruction of deep face templates Journal Article In: Engineering Applications of Artificial Intelligence, vol. 132, iss. 107941, pp. 1-20, 2024. @article{KrizajEAAI2024,
title = {Deep Face Decoder: Towards understanding the embedding space of convolutional networks through visual reconstruction of deep face templates},
author = {Janez Križaj and Richard O. Plesh and Mahesh Banavar and Stephanie Schuckers and Vitomir Štruc},
url = {https://www.sciencedirect.com/science/article/abs/pii/S095219762400099X
https://lmi.fe.uni-lj.si/wp-content/uploads/2024/02/DFD_Overleaf.pdf},
doi = {https://doi.org/10.1016/j.engappai.2024.107941},
year = {2024},
date = {2024-01-30},
urldate = {2024-01-30},
journal = {Engineering Applications of Artificial Intelligence},
volume = {132},
issue = {107941},
pages = {1-20},
abstract = {Advances in deep learning and convolutional neural networks (ConvNets) have driven remarkable face recognition (FR) progress recently. However, the black-box nature of modern ConvNet-based face recognition models makes it challenging to interpret their decision-making process, to understand the reasoning behind specific success and failure cases, or to predict their responses to unseen data characteristics. It is, therefore, critical to design mechanisms that explain the inner workings of contemporary FR models and offer insight into their behavior. To address this challenge, we present in this paper a novel textit{template-inversion approach} capable of reconstructing high-fidelity face images from the embeddings (templates, feature-space representations) produced by modern FR techniques. Our approach is based on a novel Deep Face Decoder (DFD) trained in a regression setting to visualize the information encoded in the embedding space with the goal of fostering explainability. We utilize the developed DFD model in comprehensive experiments on multiple unconstrained face datasets, namely Visual Geometry Group Face dataset 2 (VGGFace2), Labeled Faces in the Wild (LFW), and Celebrity Faces Attributes Dataset High Quality (CelebA-HQ). Our analysis focuses on the embedding spaces of two distinct face recognition models with backbones based on the Visual Geometry Group 16-layer model (VGG-16) and the 50-layer Residual Network (ResNet-50). The results reveal how information is encoded in the two considered models and how perturbations in image appearance due to rotations, translations, scaling, occlusion, or adversarial attacks, are propagated into the embedding space. Our study offers researchers a deeper comprehension of the underlying mechanisms of ConvNet-based FR models, ultimately promoting advancements in model design and explainability. },
keywords = {CNN, embedding space, face, face images, face recognition, face synthesis, template reconstruction, xai},
pubstate = {published},
tppubtype = {article}
}
Advances in deep learning and convolutional neural networks (ConvNets) have driven remarkable face recognition (FR) progress recently. However, the black-box nature of modern ConvNet-based face recognition models makes it challenging to interpret their decision-making process, to understand the reasoning behind specific success and failure cases, or to predict their responses to unseen data characteristics. It is, therefore, critical to design mechanisms that explain the inner workings of contemporary FR models and offer insight into their behavior. To address this challenge, we present in this paper a novel textit{template-inversion approach} capable of reconstructing high-fidelity face images from the embeddings (templates, feature-space representations) produced by modern FR techniques. Our approach is based on a novel Deep Face Decoder (DFD) trained in a regression setting to visualize the information encoded in the embedding space with the goal of fostering explainability. We utilize the developed DFD model in comprehensive experiments on multiple unconstrained face datasets, namely Visual Geometry Group Face dataset 2 (VGGFace2), Labeled Faces in the Wild (LFW), and Celebrity Faces Attributes Dataset High Quality (CelebA-HQ). Our analysis focuses on the embedding spaces of two distinct face recognition models with backbones based on the Visual Geometry Group 16-layer model (VGG-16) and the 50-layer Residual Network (ResNet-50). The results reveal how information is encoded in the two considered models and how perturbations in image appearance due to rotations, translations, scaling, occlusion, or adversarial attacks, are propagated into the embedding space. Our study offers researchers a deeper comprehension of the underlying mechanisms of ConvNet-based FR models, ultimately promoting advancements in model design and explainability. |
Ivanovska, Marija; Štruc, Vitomir On the Vulnerability of Deepfake Detectors to Attacks Generated by Denoising Diffusion Models Proceedings Article In: Proceedings of WACV Workshops, pp. 1051-1060, 2024. @inproceedings{MarijaWACV24,
title = {On the Vulnerability of Deepfake Detectors to Attacks Generated by Denoising Diffusion Models},
author = {Marija Ivanovska and Vitomir Štruc},
url = {https://openaccess.thecvf.com/content/WACV2024W/MAP-A/papers/Ivanovska_On_the_Vulnerability_of_Deepfake_Detectors_to_Attacks_Generated_by_WACVW_2024_paper.pdf},
year = {2024},
date = {2024-01-08},
urldate = {2024-01-08},
booktitle = {Proceedings of WACV Workshops},
pages = {1051-1060},
abstract = {The detection of malicious deepfakes is a constantly evolving problem that requires continuous monitoring of detectors to ensure they can detect image manipulations generated by the latest emerging models. In this paper, we investigate the vulnerability of single–image deepfake detectors to black–box attacks created by the newest generation of generative methods, namely Denoising Diffusion Models (DDMs). Our experiments are run on FaceForensics++, a widely used deepfake benchmark consisting of manipulated images generated with various techniques for face identity swapping and face reenactment. Attacks are crafted through guided reconstruction of existing deepfakes with a proposed DDM approach for face restoration. Our findings indicate that employing just a single denoising diffusion step in the reconstruction process of a deepfake can significantly reduce the likelihood of detection, all without introducing any perceptible image modifications. While training detectors using attack examples demonstrated some effectiveness, it was observed that discriminators trained on fully diffusion–based deepfakes exhibited limited generalizability when presented with our attacks.},
keywords = {deep learning, deepfake, deepfake detection, diffusion models, face, media forensics},
pubstate = {published},
tppubtype = {inproceedings}
}
The detection of malicious deepfakes is a constantly evolving problem that requires continuous monitoring of detectors to ensure they can detect image manipulations generated by the latest emerging models. In this paper, we investigate the vulnerability of single–image deepfake detectors to black–box attacks created by the newest generation of generative methods, namely Denoising Diffusion Models (DDMs). Our experiments are run on FaceForensics++, a widely used deepfake benchmark consisting of manipulated images generated with various techniques for face identity swapping and face reenactment. Attacks are crafted through guided reconstruction of existing deepfakes with a proposed DDM approach for face restoration. Our findings indicate that employing just a single denoising diffusion step in the reconstruction process of a deepfake can significantly reduce the likelihood of detection, all without introducing any perceptible image modifications. While training detectors using attack examples demonstrated some effectiveness, it was observed that discriminators trained on fully diffusion–based deepfakes exhibited limited generalizability when presented with our attacks. |
2023
|
Pernuš, Martin; Štruc, Vitomir; Dobrišek, Simon MaskFaceGAN: High Resolution Face Editing With Masked GAN Latent Code Optimization Journal Article In: IEEE Transactions on Image Processing, 2023, ISSN: 1941-0042. @article{MaskFaceGAN,
title = {MaskFaceGAN: High Resolution Face Editing With Masked GAN Latent Code Optimization},
author = {Martin Pernuš and Vitomir Štruc and Simon Dobrišek},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10299582
https://lmi.fe.uni-lj.si/wp-content/uploads/2023/02/MaskFaceGAN_compressed.pdf
https://arxiv.org/pdf/2103.11135.pdf},
doi = {10.1109/TIP.2023.3326675},
issn = {1941-0042},
year = {2023},
date = {2023-10-27},
urldate = {2023-01-02},
journal = {IEEE Transactions on Image Processing},
abstract = {Face editing represents a popular research topic within the computer vision and image processing communities. While significant progress has been made recently in this area, existing solutions: ( i ) are still largely focused on low-resolution images, ( ii ) often generate editing results with visual artefacts, or ( iii ) lack fine-grained control over the editing procedure and alter multiple (entangled) attributes simultaneously, when trying to generate the desired facial semantics. In this paper, we aim to address these issues through a novel editing approach, called MaskFaceGAN that focuses on local attribute editing. The proposed approach is based on an optimization procedure that directly optimizes the latent code of a pre-trained (state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with respect to several constraints that ensure: ( i ) preservation of relevant image content, ( ii ) generation of the targeted facial attributes, and ( iii ) spatially–selective treatment of local image regions. The constraints are enforced with the help of an (differentiable) attribute classifier and face parser that provide the necessary reference information for the optimization procedure. MaskFaceGAN is evaluated in extensive experiments on the FRGC, SiblingsDB-HQf, and XM2VTS datasets and in comparison with several state-of-the-art techniques from the literature. Our experimental results show that the proposed approach is able to edit face images with respect to several local facial attributes with unprecedented image quality and at high-resolutions (1024×1024), while exhibiting considerably less problems with attribute entanglement than competing solutions. The source code is publicly available from: https://github.com/MartinPernus/MaskFaceGAN.},
keywords = {CNN, computer vision, deep learning, face editing, face image processing, GAN, GAN inversion, generative models, StyleGAN},
pubstate = {published},
tppubtype = {article}
}
Face editing represents a popular research topic within the computer vision and image processing communities. While significant progress has been made recently in this area, existing solutions: ( i ) are still largely focused on low-resolution images, ( ii ) often generate editing results with visual artefacts, or ( iii ) lack fine-grained control over the editing procedure and alter multiple (entangled) attributes simultaneously, when trying to generate the desired facial semantics. In this paper, we aim to address these issues through a novel editing approach, called MaskFaceGAN that focuses on local attribute editing. The proposed approach is based on an optimization procedure that directly optimizes the latent code of a pre-trained (state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with respect to several constraints that ensure: ( i ) preservation of relevant image content, ( ii ) generation of the targeted facial attributes, and ( iii ) spatially–selective treatment of local image regions. The constraints are enforced with the help of an (differentiable) attribute classifier and face parser that provide the necessary reference information for the optimization procedure. MaskFaceGAN is evaluated in extensive experiments on the FRGC, SiblingsDB-HQf, and XM2VTS datasets and in comparison with several state-of-the-art techniques from the literature. Our experimental results show that the proposed approach is able to edit face images with respect to several local facial attributes with unprecedented image quality and at high-resolutions (1024×1024), while exhibiting considerably less problems with attribute entanglement than competing solutions. The source code is publicly available from: https://github.com/MartinPernus/MaskFaceGAN. |
Larue, Nicolas; Vu, Ngoc-Son; Štruc, Vitomir; Peer, Peter; Christophides, Vassilis SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes Proceedings Article In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 21011 - 21021, IEEE 2023. @inproceedings{NicolasCCV,
title = {SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes},
author = {Nicolas Larue and Ngoc-Son Vu and Vitomir Štruc and Peter Peer and Vassilis Christophides},
url = {https://openaccess.thecvf.com/content/ICCV2023/papers/Larue_SeeABLE_Soft_Discrepancies_and_Bounded_Contrastive_Learning_for_Exposing_Deepfakes_ICCV_2023_paper.pdf
https://lmi.fe.uni-lj.si/wp-content/uploads/2024/01/SeeABLE_compressed.pdf
https://lmi.fe.uni-lj.si/wp-content/uploads/2024/01/SeeABLE_supplementary_compressed.pdf},
year = {2023},
date = {2023-10-01},
urldate = {2023-10-01},
booktitle = {Proceedings of the International Conference on Computer Vision (ICCV)},
pages = {21011 - 21021},
organization = {IEEE},
abstract = {Modern deepfake detectors have achieved encouraging results, when training and test images are drawn from the same data collection. However, when these detectors are applied to images produced with unknown deepfake-generation techniques, considerable performance degradations are commonly observed. In this paper, we propose a novel deepfake detector, called SeeABLE, that formalizes the detection problem as a (one-class) out-of-distribution detection task and generalizes better to unseen deepfakes. Specifically, SeeABLE first generates local image perturbations (referred to as soft-discrepancies) and then pushes the perturbed faces towards predefined prototypes using a novel regression-based bounded contrastive loss. To strengthen the generalization performance of SeeABLE to unknown deepfake types, we generate a rich set of soft discrepancies and train the detector: (i) to localize, which part of the face was modified, and (ii) to identify the alteration type. To demonstrate the capabilities of SeeABLE, we perform rigorous experiments on several widely-used deepfake datasets and show that our model convincingly outperforms competing state-of-the-art detectors, while exhibiting highly encouraging generalization capabilities. The source code for SeeABLE is available from: https://github.com/anonymous-author-sub/seeable.
},
keywords = {CNN, deepfake detection, deepfakes, face, media forensics, one-class learning, representation learning},
pubstate = {published},
tppubtype = {inproceedings}
}
Modern deepfake detectors have achieved encouraging results, when training and test images are drawn from the same data collection. However, when these detectors are applied to images produced with unknown deepfake-generation techniques, considerable performance degradations are commonly observed. In this paper, we propose a novel deepfake detector, called SeeABLE, that formalizes the detection problem as a (one-class) out-of-distribution detection task and generalizes better to unseen deepfakes. Specifically, SeeABLE first generates local image perturbations (referred to as soft-discrepancies) and then pushes the perturbed faces towards predefined prototypes using a novel regression-based bounded contrastive loss. To strengthen the generalization performance of SeeABLE to unknown deepfake types, we generate a rich set of soft discrepancies and train the detector: (i) to localize, which part of the face was modified, and (ii) to identify the alteration type. To demonstrate the capabilities of SeeABLE, we perform rigorous experiments on several widely-used deepfake datasets and show that our model convincingly outperforms competing state-of-the-art detectors, while exhibiting highly encouraging generalization capabilities. The source code for SeeABLE is available from: https://github.com/anonymous-author-sub/seeable.
|
Rot, Peter; Grm, Klemen; Peer, Peter; Štruc, Vitomir PrivacyProber: Assessment and Detection of Soft–Biometric Privacy–Enhancing Techniques Journal Article In: IEEE Transactions on Dependable and Secure Computing, pp. 1-18, 2023, ISBN: 1545-5971. @article{PrivacProberRot,
title = {PrivacyProber: Assessment and Detection of Soft–Biometric Privacy–Enhancing Techniques},
author = {Peter Rot and Klemen Grm and Peter Peer and Vitomir Štruc},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10264192},
doi = {10.1109/TDSC.2023.3319500},
isbn = {1545-5971},
year = {2023},
date = {2023-09-23},
journal = {IEEE Transactions on Dependable and Secure Computing},
pages = {1-18},
abstract = {Soft–biometric privacy–enhancing techniques represent machine learning methods that aim to: (i) mitigate privacy concerns associated with face recognition technology by suppressing selected soft–biometric attributes in facial images (e.g., gender, age, ethnicity) and (ii) make unsolicited extraction of sensitive personal information infeasible. Because such techniques are increasingly used in real–world applications, it is imperative to understand to what extent the privacy enhancement can be inverted and how much attribute information can be recovered from privacy–enhanced images. While these aspects are critical, they have not been investigated in the literature so far. In this paper, we, therefore, study the robustness of several state–of–the–art soft–biometric privacy–enhancing techniques to attribute recovery attempts. We propose PrivacyProber, a high–level framework for restoring soft–biometric information from privacy–enhanced facial images, and apply it for attribute recovery in comprehensive experiments on three public face datasets, i.e., LFW, MUCT and Adience. Our experiments show that the proposed framework is able to restore a considerable amount of suppressed information, regardless of the privacy–enhancing technique used (e.g., adversarial perturbations, conditional synthesis, etc.), but also that there are significant differences between the considered privacy models. These results point to the need for novel mechanisms that can improve the robustness of existing privacy–enhancing techniques and secure them against potential adversaries trying to restore suppressed information. Additionally, we demonstrate that PrivacyProber can also be used to detect privacy–enhancement in facial images (under black–box assumptions) with high accuracy. Specifically, we show that a detection procedure can be developed around the proposed framework that is learning free and, therefore, generalizes well across different data characteristics and privacy–enhancing techniques.},
keywords = {biometrics, face, privacy, privacy enhancement, privacy protection, privacy-enhancing techniques, soft biometric privacy},
pubstate = {published},
tppubtype = {article}
}
Soft–biometric privacy–enhancing techniques represent machine learning methods that aim to: (i) mitigate privacy concerns associated with face recognition technology by suppressing selected soft–biometric attributes in facial images (e.g., gender, age, ethnicity) and (ii) make unsolicited extraction of sensitive personal information infeasible. Because such techniques are increasingly used in real–world applications, it is imperative to understand to what extent the privacy enhancement can be inverted and how much attribute information can be recovered from privacy–enhanced images. While these aspects are critical, they have not been investigated in the literature so far. In this paper, we, therefore, study the robustness of several state–of–the–art soft–biometric privacy–enhancing techniques to attribute recovery attempts. We propose PrivacyProber, a high–level framework for restoring soft–biometric information from privacy–enhanced facial images, and apply it for attribute recovery in comprehensive experiments on three public face datasets, i.e., LFW, MUCT and Adience. Our experiments show that the proposed framework is able to restore a considerable amount of suppressed information, regardless of the privacy–enhancing technique used (e.g., adversarial perturbations, conditional synthesis, etc.), but also that there are significant differences between the considered privacy models. These results point to the need for novel mechanisms that can improve the robustness of existing privacy–enhancing techniques and secure them against potential adversaries trying to restore suppressed information. Additionally, we demonstrate that PrivacyProber can also be used to detect privacy–enhancement in facial images (under black–box assumptions) with high accuracy. Specifically, we show that a detection procedure can be developed around the proposed framework that is learning free and, therefore, generalizes well across different data characteristics and privacy–enhancing techniques. |
Babnik, Žiga; Peer, Peter; Štruc, Vitomir DifFIQA: Face Image Quality Assessment Using Denoising Diffusion Probabilistic Models Proceedings Article In: IEEE International Joint Conference on Biometrics , pp. 1-10, IEEE, Ljubljana, Slovenia, 2023. @inproceedings{Diffiqa_2023,
title = {DifFIQA: Face Image Quality Assessment Using Denoising Diffusion Probabilistic Models},
author = {Žiga Babnik and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/121.pdf
https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/121-supp.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics },
pages = {1-10},
publisher = {IEEE},
address = {Ljubljana, Slovenia},
abstract = {Modern face recognition (FR) models excel in constrained
scenarios, but often suffer from decreased performance
when deployed in unconstrained (real-world) environments
due to uncertainties surrounding the quality
of the captured facial data. Face image quality assessment
(FIQA) techniques aim to mitigate these performance
degradations by providing FR models with sample-quality
predictions that can be used to reject low-quality samples
and reduce false match errors. However, despite steady improvements,
ensuring reliable quality estimates across facial
images with diverse characteristics remains challenging.
In this paper, we present a powerful new FIQA approach,
named DifFIQA, which relies on denoising diffusion
probabilistic models (DDPM) and ensures highly competitive
results. The main idea behind the approach is to utilize
the forward and backward processes of DDPMs to perturb
facial images and quantify the impact of these perturbations
on the corresponding image embeddings for quality
prediction. Because the diffusion-based perturbations are
computationally expensive, we also distill the knowledge
encoded in DifFIQA into a regression-based quality predictor,
called DifFIQA(R), that balances performance and
execution time. We evaluate both models in comprehensive
experiments on 7 diverse datasets, with 4 target FR models
and against 10 state-of-the-art FIQA techniques with
highly encouraging results. The source code is available
from: https://github.com/LSIbabnikz/DifFIQA.},
keywords = {biometrics, deep learning, denoising diffusion probabilistic models, diffusion, face, face image quality assesment, face recognition, FIQA, quality},
pubstate = {published},
tppubtype = {inproceedings}
}
Modern face recognition (FR) models excel in constrained
scenarios, but often suffer from decreased performance
when deployed in unconstrained (real-world) environments
due to uncertainties surrounding the quality
of the captured facial data. Face image quality assessment
(FIQA) techniques aim to mitigate these performance
degradations by providing FR models with sample-quality
predictions that can be used to reject low-quality samples
and reduce false match errors. However, despite steady improvements,
ensuring reliable quality estimates across facial
images with diverse characteristics remains challenging.
In this paper, we present a powerful new FIQA approach,
named DifFIQA, which relies on denoising diffusion
probabilistic models (DDPM) and ensures highly competitive
results. The main idea behind the approach is to utilize
the forward and backward processes of DDPMs to perturb
facial images and quantify the impact of these perturbations
on the corresponding image embeddings for quality
prediction. Because the diffusion-based perturbations are
computationally expensive, we also distill the knowledge
encoded in DifFIQA into a regression-based quality predictor,
called DifFIQA(R), that balances performance and
execution time. We evaluate both models in comprehensive
experiments on 7 diverse datasets, with 4 target FR models
and against 10 state-of-the-art FIQA techniques with
highly encouraging results. The source code is available
from: https://github.com/LSIbabnikz/DifFIQA. |
Peng, Bo; Sun, Xianyun; Wang, Caiyong; Wang, Wei; Dong, Jing; Sun, Zhenan; Zhang, Rongyu; Cong, Heng; Fu, Lingzhi; Wang, Hao; Zhang, Yusheng; Zhang, HanYuan; Zhang, Xin; Liu, Boyuan; Ling, Hefei; Dragar, Luka; Batagelj, Borut; Peer, Peter; Struc, Vitomir; Zhou, Xinghui; Liu, Kunlin; Feng, Weitao; Zhang, Weiming; Wang, Haitao; Diao, Wenxiu DFGC-VRA: DeepFake Game Competition on Visual Realism Assessment Proceedings Article In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-9, Ljubljana, Slovenia, 2023. @inproceedings{Deepfake_comp2023,
title = {DFGC-VRA: DeepFake Game Competition on Visual Realism Assessment},
author = {Bo Peng and Xianyun Sun and Caiyong Wang and Wei Wang and Jing Dong and Zhenan Sun and Rongyu Zhang and Heng Cong and Lingzhi Fu and Hao Wang and Yusheng Zhang and HanYuan Zhang and Xin Zhang and Boyuan Liu and Hefei Ling and Luka Dragar and Borut Batagelj and Peter Peer and Vitomir Struc and Xinghui Zhou and Kunlin Liu and Weitao Feng and Weiming Zhang and Haitao Wang and Wenxiu Diao},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/CameraReady-225.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics (IJCB 2023)},
pages = {1-9},
address = {Ljubljana, Slovenia},
abstract = {This paper presents the summary report on the DeepFake
Game Competition on Visual Realism Assessment (DFGCVRA).
Deep-learning based face-swap videos, also known
as deepfakes, are becoming more and more realistic and
deceiving. The malicious usage of these face-swap videos
has caused wide concerns. There is a ongoing deepfake
game between its creators and detectors, with the human in
the loop. The research community has been focusing on
the automatic detection of these fake videos, but the assessment
of their visual realism, as perceived by human
eyes, is still an unexplored dimension. Visual realism assessment,
or VRA, is essential for assessing the potential
impact that may be brought by a specific face-swap video,
and it is also useful as a quality metric to compare different
face-swap methods. This is the third edition of DFGC
competitions, which focuses on the new visual realism assessment
topic, different from previous ones that compete
creators versus detectors. With this competition, we conduct
a comprehensive study of the SOTA performance on
the new task. We also release our MindSpore codes to fur-
*Jing Dong (jdong@nlpr.ia.ac.cn) is the corresponding author.
ther facilitate research in this field (https://github.
com/bomb2peng/DFGC-VRA-benckmark).},
keywords = {competition IJCB, deepfake detection, deepfakes, face, realism assessment},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents the summary report on the DeepFake
Game Competition on Visual Realism Assessment (DFGCVRA).
Deep-learning based face-swap videos, also known
as deepfakes, are becoming more and more realistic and
deceiving. The malicious usage of these face-swap videos
has caused wide concerns. There is a ongoing deepfake
game between its creators and detectors, with the human in
the loop. The research community has been focusing on
the automatic detection of these fake videos, but the assessment
of their visual realism, as perceived by human
eyes, is still an unexplored dimension. Visual realism assessment,
or VRA, is essential for assessing the potential
impact that may be brought by a specific face-swap video,
and it is also useful as a quality metric to compare different
face-swap methods. This is the third edition of DFGC
competitions, which focuses on the new visual realism assessment
topic, different from previous ones that compete
creators versus detectors. With this competition, we conduct
a comprehensive study of the SOTA performance on
the new task. We also release our MindSpore codes to fur-
*Jing Dong (jdong@nlpr.ia.ac.cn) is the corresponding author.
ther facilitate research in this field (https://github.
com/bomb2peng/DFGC-VRA-benckmark). |
Kolf, Jan Niklas; Boutros, Fadi; Elliesen, Jurek; Theuerkauf, Markus; Damer, Naser; Alansari, Mohamad Y; Hay, Oussama Abdul; Alansari, Sara Yousif; Javed, Sajid; Werghi, Naoufel; Grm, Klemen; Struc, Vitomir; Alonso-Fernandez, Fernando; Hernandez-Diaz, Kevin; Bigun, Josef; George, Anjith; Ecabert, Christophe; Shahreza, Hatef Otroshi; Kotwal, Ketan; Marcel, Sébastien; Medvedev, Iurii; Bo, Jin; Nunes, Diogo; Hassanpour, Ahmad; Khatiwada, Pankaj; Toor, Aafan Ahmad; Yang, Bian EFaR 2023: Efficient Face Recognition Competition Proceedings Article In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-12, Ljubljana, Slovenia, 2023. @inproceedings{EFAR2023_2023,
title = {EFaR 2023: Efficient Face Recognition Competition},
author = {Jan Niklas Kolf and Fadi Boutros and Jurek Elliesen and Markus Theuerkauf and Naser Damer and Mohamad Y Alansari and Oussama Abdul Hay and Sara Yousif Alansari and Sajid Javed and Naoufel Werghi and Klemen Grm and Vitomir Struc and Fernando Alonso-Fernandez and Kevin Hernandez-Diaz and Josef Bigun and Anjith George and Christophe Ecabert and Hatef Otroshi Shahreza and Ketan Kotwal and Sébastien Marcel and Iurii Medvedev and Jin Bo and Diogo Nunes and Ahmad Hassanpour and Pankaj Khatiwada and Aafan Ahmad Toor and Bian Yang},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/CameraReady-231.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics (IJCB 2023)},
pages = {1-12},
address = {Ljubljana, Slovenia},
abstract = {This paper presents the summary of the Efficient Face
Recognition Competition (EFaR) held at the 2023 International
Joint Conference on Biometrics (IJCB 2023). The
competition received 17 submissions from 6 different teams.
To drive further development of efficient face recognition
models, the submitted solutions are ranked based on a
weighted score of the achieved verification accuracies on a
diverse set of benchmarks, as well as the deployability given
by the number of floating-point operations and model size.
The evaluation of submissions is extended to bias, crossquality,
and large-scale recognition benchmarks. Overall,
the paper gives an overview of the achieved performance
values of the submitted solutions as well as a diverse set of
baselines. The submitted solutions use small, efficient network
architectures to reduce the computational cost, some
solutions apply model quantization. An outlook on possible
techniques that are underrepresented in current solutions is
given as well.},
keywords = {biometrics, deep learning, face, face recognition, lightweight models},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents the summary of the Efficient Face
Recognition Competition (EFaR) held at the 2023 International
Joint Conference on Biometrics (IJCB 2023). The
competition received 17 submissions from 6 different teams.
To drive further development of efficient face recognition
models, the submitted solutions are ranked based on a
weighted score of the achieved verification accuracies on a
diverse set of benchmarks, as well as the deployability given
by the number of floating-point operations and model size.
The evaluation of submissions is extended to bias, crossquality,
and large-scale recognition benchmarks. Overall,
the paper gives an overview of the achieved performance
values of the submitted solutions as well as a diverse set of
baselines. The submitted solutions use small, efficient network
architectures to reduce the computational cost, some
solutions apply model quantization. An outlook on possible
techniques that are underrepresented in current solutions is
given as well. |
Das, Abhijit; Atreya, Saurabh K; Mukherjee, Aritra; Vitek, Matej; Li, Haiqing; Wang, Caiyong; Guangzhe, Zhao; Boutros, Fadi; Siebke, Patrick; Kolf, Jan Niklas; Damer, Naser; Sun, Ye; Hexin, Lu; Aobo, Fab; Sheng, You; Nathan, Sabari; Ramamoorthy, Suganya; S, Rampriya R; G, Geetanjali; Sihag, Prinaka; Nigam, Aditya; Peer, Peter; Pal, Umapada; Struc, Vitomir Sclera Segmentation and Joint Recognition Benchmarking Competition: SSRBC 2023 Proceedings Article In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-10, Ljubljana, Slovenia, 2023. @inproceedings{SSBRC2023,
title = {Sclera Segmentation and Joint Recognition Benchmarking Competition: SSRBC 2023},
author = {Abhijit Das and Saurabh K Atreya and Aritra Mukherjee and Matej Vitek and Haiqing Li and Caiyong Wang and Zhao Guangzhe and Fadi Boutros and Patrick Siebke and Jan Niklas Kolf and Naser Damer and Ye Sun and Lu Hexin and Fab Aobo and You Sheng and Sabari Nathan and Suganya Ramamoorthy and Rampriya R S and Geetanjali G and Prinaka Sihag and Aditya Nigam and Peter Peer and Umapada Pal and Vitomir Struc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/CameraReady-233.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics (IJCB 2023)},
pages = {1-10},
address = {Ljubljana, Slovenia},
abstract = {This paper presents the summary of the Sclera Segmentation
and Joint Recognition Benchmarking Competition (SSRBC
2023) held in conjunction with IEEE International
Joint Conference on Biometrics (IJCB 2023). Different from
the previous editions of the competition, SSRBC 2023 not
only explored the performance of the latest and most advanced
sclera segmentation models, but also studied the impact
of segmentation quality on recognition performance.
Five groups took part in SSRBC 2023 and submitted a total
of six segmentation models and one recognition technique
for scoring. The submitted solutions included a wide
variety of conceptually diverse deep-learning models and
were rigorously tested on three publicly available datasets,
i.e., MASD, SBVPI and MOBIUS. Most of the segmentation
models achieved encouraging segmentation and recognition
performance. Most importantly, we observed that better
segmentation results always translate into better verification
performance.},
keywords = {biometrics, competition IJCB, computer vision, deep learning, sclera, sclera segmentation},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents the summary of the Sclera Segmentation
and Joint Recognition Benchmarking Competition (SSRBC
2023) held in conjunction with IEEE International
Joint Conference on Biometrics (IJCB 2023). Different from
the previous editions of the competition, SSRBC 2023 not
only explored the performance of the latest and most advanced
sclera segmentation models, but also studied the impact
of segmentation quality on recognition performance.
Five groups took part in SSRBC 2023 and submitted a total
of six segmentation models and one recognition technique
for scoring. The submitted solutions included a wide
variety of conceptually diverse deep-learning models and
were rigorously tested on three publicly available datasets,
i.e., MASD, SBVPI and MOBIUS. Most of the segmentation
models achieved encouraging segmentation and recognition
performance. Most importantly, we observed that better
segmentation results always translate into better verification
performance. |
Emersic, Ziga; Ohki, Tetsushi; Akasaka, Muku; Arakawa, Takahiko; Maeda, Soshi; Okano, Masora; Sato, Yuya; George, Anjith; Marcel, Sébastien; Ganapathi, Iyyakutti Iyappan; Ali, Syed Sadaf; Javed, Sajid; Werghi, Naoufel; Işık, Selin Gök; Sarıtaş, Erdi; Ekenel, Hazim Kemal; Hudovernik, Valter; Kolf, Jan Niklas; Boutros, Fadi; Damer, Naser; Sharma, Geetanjali; Kamboj, Aman; Nigam, Aditya; Jain, Deepak Kumar; Cámara, Guillermo; Peer, Peter; Struc, Vitomir The Unconstrained Ear Recognition Challenge 2023: Maximizing Performance and Minimizing Bias Proceedings Article In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-10, Ljubljana, Slovenia, 2023. @inproceedings{UERC2023,
title = {The Unconstrained Ear Recognition Challenge 2023: Maximizing Performance and Minimizing Bias},
author = {Ziga Emersic and Tetsushi Ohki and Muku Akasaka and Takahiko Arakawa and Soshi Maeda and Masora Okano and Yuya Sato and Anjith George and Sébastien Marcel and Iyyakutti Iyappan Ganapathi and Syed Sadaf Ali and Sajid Javed and Naoufel Werghi and Selin Gök Işık and Erdi Sarıtaş and Hazim Kemal Ekenel and Valter Hudovernik and Jan Niklas Kolf and Fadi Boutros and Naser Damer and Geetanjali Sharma and Aman Kamboj and Aditya Nigam and Deepak Kumar Jain and Guillermo Cámara and Peter Peer and Vitomir Struc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/CameraReady-234.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics (IJCB 2023)},
pages = {1-10},
address = {Ljubljana, Slovenia},
abstract = {The paper provides a summary of the 2023 Unconstrained
Ear Recognition Challenge (UERC), a benchmarking
effort focused on ear recognition from images acquired
in uncontrolled environments. The objective of the challenge
was to evaluate the effectiveness of current ear recognition
techniques on a challenging ear dataset while analyzing
the techniques from two distinct aspects, i.e., verification
performance and bias with respect to specific demographic
factors, i.e., gender and ethnicity. Seven research
groups participated in the challenge and submitted
a seven distinct recognition approaches that ranged from
descriptor-based methods and deep-learning models to ensemble
techniques that relied on multiple data representations
to maximize performance and minimize bias. A comprehensive
investigation into the performance of the submitted
models is presented, as well as an in-depth analysis of
bias and associated performance differentials due to differences
in gender and ethnicity. The results of the challenge
suggest that a wide variety of models (e.g., transformers,
convolutional neural networks, ensemble models) is capable
of achieving competitive recognition results, but also
that all of the models still exhibit considerable performance
differentials with respect to both gender and ethnicity. To
promote further development of unbiased and effective ear
recognition models, the starter kit of UERC 2023 together
with the baseline model, and training and test data is made
available from: http://ears.fri.uni-lj.si/.},
keywords = {biometrics, competition, computer vision, deep learning, ear, ear biometrics, UERC 2023},
pubstate = {published},
tppubtype = {inproceedings}
}
The paper provides a summary of the 2023 Unconstrained
Ear Recognition Challenge (UERC), a benchmarking
effort focused on ear recognition from images acquired
in uncontrolled environments. The objective of the challenge
was to evaluate the effectiveness of current ear recognition
techniques on a challenging ear dataset while analyzing
the techniques from two distinct aspects, i.e., verification
performance and bias with respect to specific demographic
factors, i.e., gender and ethnicity. Seven research
groups participated in the challenge and submitted
a seven distinct recognition approaches that ranged from
descriptor-based methods and deep-learning models to ensemble
techniques that relied on multiple data representations
to maximize performance and minimize bias. A comprehensive
investigation into the performance of the submitted
models is presented, as well as an in-depth analysis of
bias and associated performance differentials due to differences
in gender and ethnicity. The results of the challenge
suggest that a wide variety of models (e.g., transformers,
convolutional neural networks, ensemble models) is capable
of achieving competitive recognition results, but also
that all of the models still exhibit considerable performance
differentials with respect to both gender and ethnicity. To
promote further development of unbiased and effective ear
recognition models, the starter kit of UERC 2023 together
with the baseline model, and training and test data is made
available from: http://ears.fri.uni-lj.si/. |
Ivanovska, Marija; Štruc, Vitomir; Perš, Janez TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models Proceedings Article In: 18th International Conference on Machine Vision and Applications (MVA 2023), pp. 1-6, 2023. @inproceedings{MarijaTomato2023,
title = {TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models},
author = {Marija Ivanovska and Vitomir Štruc and Janez Perš },
url = {https://arxiv.org/pdf/2307.01064.pdf
https://ieeexplore.ieee.org/document/10215774},
doi = {10.23919/MVA57639.2023.10215774},
year = {2023},
date = {2023-07-23},
urldate = {2023-07-23},
booktitle = {18th International Conference on Machine Vision and Applications (MVA 2023)},
pages = {1-6},
abstract = {Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates
state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github. com/MIvanovska/TomatoDIFF},
keywords = {agriculture, dataset, deep learning, diffusion, plan segmentation, plant monitoring, robotics, segmentation, tomato dataset},
pubstate = {published},
tppubtype = {inproceedings}
}
Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates
state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github. com/MIvanovska/TomatoDIFF |
Vitek, Matej; Bizjak, Matic; Peer, Peter; Štruc, Vitomir IPAD: Iterative Pruning with Activation Deviation for Sclera Biometrics Journal Article In: Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 8, pp. 1-21, 2023. @article{VitekSaud2023,
title = {IPAD: Iterative Pruning with Activation Deviation for Sclera Biometrics},
author = {Matej Vitek and Matic Bizjak and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/07/PublishedVersion.pdf},
doi = {https://doi.org/10.1016/j.jksuci.2023.101630},
year = {2023},
date = {2023-07-10},
journal = {Journal of King Saud University - Computer and Information Sciences},
volume = {35},
number = {8},
pages = {1-21},
abstract = {The sclera has recently been gaining attention as a biometric modality due to its various desirable characteristics. A key step in any type of ocular biometric recognition, including sclera recognition, is the segmentation of the relevant part(s) of the eye. However, the high computational complexity of the (deep) segmentation models used in this task can limit their applicability on resource-constrained devices such as smartphones or head-mounted displays. As these devices are a common desired target for such biometric systems, lightweight solutions for ocular segmentation are critically needed. To address this issue, this paper introduces IPAD (Iterative Pruning with Activation Deviation), a novel method for developing lightweight convolutional networks, that is based on model pruning. IPAD uses a novel filter-activation-based criterion (ADC) to determine low-importance filters and employs an iterative model pruning procedure to derive the final lightweight model. To evaluate the proposed pruning procedure, we conduct extensive experiments with two diverse segmentation models, over four publicly available datasets (SBVPI, SLD, SMD and MOBIUS), in four distinct problem configurations and in comparison to state-of-the-art methods from the literature. The results of the experiments show that the proposed filter-importance criterion outperforms the standard L1 and L2 approaches from the literature. Furthermore, the results also suggest that: 1) the pruned models are able to retain (or even improve on) the performance of the unpruned originals, as long as they are not over-pruned, with RITnet and U-Net at 50% of their original FLOPs reaching up to 4% and 7% higher IoU values than their unpruned versions, respectively, 2) smaller models require more careful pruning, as the pruning process can hurt the model’s generalization capabilities, and 3) the novel criterion most convincingly outperforms the classic approaches when sufficient training data is available, implying that the abundance of data leads to more robust activation-based importance computation.},
keywords = {biometrics, CNN, deep learning, model compression, pruning, sclera, sclera segmentation},
pubstate = {published},
tppubtype = {article}
}
The sclera has recently been gaining attention as a biometric modality due to its various desirable characteristics. A key step in any type of ocular biometric recognition, including sclera recognition, is the segmentation of the relevant part(s) of the eye. However, the high computational complexity of the (deep) segmentation models used in this task can limit their applicability on resource-constrained devices such as smartphones or head-mounted displays. As these devices are a common desired target for such biometric systems, lightweight solutions for ocular segmentation are critically needed. To address this issue, this paper introduces IPAD (Iterative Pruning with Activation Deviation), a novel method for developing lightweight convolutional networks, that is based on model pruning. IPAD uses a novel filter-activation-based criterion (ADC) to determine low-importance filters and employs an iterative model pruning procedure to derive the final lightweight model. To evaluate the proposed pruning procedure, we conduct extensive experiments with two diverse segmentation models, over four publicly available datasets (SBVPI, SLD, SMD and MOBIUS), in four distinct problem configurations and in comparison to state-of-the-art methods from the literature. The results of the experiments show that the proposed filter-importance criterion outperforms the standard L1 and L2 approaches from the literature. Furthermore, the results also suggest that: 1) the pruned models are able to retain (or even improve on) the performance of the unpruned originals, as long as they are not over-pruned, with RITnet and U-Net at 50% of their original FLOPs reaching up to 4% and 7% higher IoU values than their unpruned versions, respectively, 2) smaller models require more careful pruning, as the pruning process can hurt the model’s generalization capabilities, and 3) the novel criterion most convincingly outperforms the classic approaches when sufficient training data is available, implying that the abundance of data leads to more robust activation-based importance computation. |
Plesh, Richard; Peer, Peter; Štruc, Vitomir GlassesGAN: Eyewear Personalization using Synthetic Appearance Discovery and Targeted Subspace Modeling Proceedings Article In: Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR) , 2023. @inproceedings{PleshCVPR2023,
title = {GlassesGAN: Eyewear Personalization using Synthetic Appearance Discovery and Targeted Subspace Modeling},
author = {Richard Plesh and Peter Peer and Vitomir Štruc},
url = {https://arxiv.org/pdf/2210.14145.pdf
https://openaccess.thecvf.com/content/CVPR2023/html/Plesh_GlassesGAN_Eyewear_Personalization_Using_Synthetic_Appearance_Discovery_and_Targeted_Subspace_CVPR_2023_paper.html},
year = {2023},
date = {2023-06-18},
urldate = {2023-06-18},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR) },
abstract = {We present GlassesGAN, a novel image editing framework for custom design of glasses, that sets a new standard in terms of image quality, edit realism, and continuous multi-style edit capability. To facilitate the editing process with GlassesGAN, we propose a Targeted Subspace Modelling (TSM) procedure that, based on a novel mechanism for (synthetic) appearance discovery in the latent space of a pre-trained GAN generator, constructs an eyeglasses-specific (latent) subspace that the editing framework can utilize. Additionally, we also introduce an appearance-constrained subspace initialization (SI) technique that centers the latent representation of the given input image in the well-defined part of the constructed subspace to improve the reliability of the learned edits. We test GlassesGAN on two (diverse) high-resolution datasets (CelebA-HQ and SiblingsDB-HQf) and compare it to three state-of-the-art competitors, i.e., InterfaceGAN, GANSpace, and MaskGAN. The reported results show that GlassesGAN convincingly outperforms all competing techniques, while offering additional functionality (e.g., fine-grained multi-style editing) not available with any of the competitors. The source code will be made freely available.},
keywords = {eyewear, eyewear personalization, face editing, GAN inversion, latent space editing, StyleGAN2, synthetic appearance discovery, targeted subspace modeling, virtual try-on},
pubstate = {published},
tppubtype = {inproceedings}
}
We present GlassesGAN, a novel image editing framework for custom design of glasses, that sets a new standard in terms of image quality, edit realism, and continuous multi-style edit capability. To facilitate the editing process with GlassesGAN, we propose a Targeted Subspace Modelling (TSM) procedure that, based on a novel mechanism for (synthetic) appearance discovery in the latent space of a pre-trained GAN generator, constructs an eyeglasses-specific (latent) subspace that the editing framework can utilize. Additionally, we also introduce an appearance-constrained subspace initialization (SI) technique that centers the latent representation of the given input image in the well-defined part of the constructed subspace to improve the reliability of the learned edits. We test GlassesGAN on two (diverse) high-resolution datasets (CelebA-HQ and SiblingsDB-HQf) and compare it to three state-of-the-art competitors, i.e., InterfaceGAN, GANSpace, and MaskGAN. The reported results show that GlassesGAN convincingly outperforms all competing techniques, while offering additional functionality (e.g., fine-grained multi-style editing) not available with any of the competitors. The source code will be made freely available. |
Pernuš, Martin; Bhatnagar, Mansi; Samad, Badr; Singh, Divyanshu; Peer, Peter; Štruc, Vitomir; Dobrišek, Simon ChildNet: Structural Kinship Face Synthesis Model With Appearance Control Mechanisms Journal Article In: IEEE Access, pp. 1-22, 2023, ISSN: 2169-3536. @article{AccessMartin2023,
title = {ChildNet: Structural Kinship Face Synthesis Model With Appearance Control Mechanisms},
author = {Martin Pernuš and Mansi Bhatnagar and Badr Samad and Divyanshu Singh and Peter Peer and Vitomir Štruc and Simon Dobrišek},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10126110},
doi = {10.1109/ACCESS.2023.3276877},
issn = {2169-3536},
year = {2023},
date = {2023-05-17},
journal = {IEEE Access},
pages = {1-22},
abstract = {Kinship face synthesis is an increasingly popular topic within the computer vision community, particularly the task of predicting the child appearance using parental images. Previous work has been limited in terms of model capacity and inadequate training data, which is comprised of low-resolution and tightly cropped images, leading to lower synthesis quality. In this paper, we propose ChildNet, a method for kinship face synthesis that leverages the facial image generation capabilities of a state-of-the-art Generative Adversarial Network (GAN), and resolves the aforementioned problems. ChildNet is designed within the GAN latent space and is able to predict a child appearance that bears high resemblance to real parents’ children. To ensure fine-grained control, we propose an age and gender manipulation module that allows precise manipulation of the child synthesis result. ChildNet is capable of generating multiple child images per parent pair input, while providing a way to control the image generation variability. Additionally, we introduce a mechanism to control the dominant parent image. Finally, to facilitate the task of kinship face synthesis, we introduce a new kinship dataset, called Next of Kin. This dataset contains 3690 high-resolution face images with a diverse range of ethnicities and ages. We evaluate ChildNet in comprehensive experiments against three competing kinship face synthesis models, using two kinship datasets. The experiments demonstrate the superior performance of ChildNet in terms of identity similarity, while exhibiting high perceptual image quality. The source code for the model is publicly available at: https://github.com/MartinPernus/ChildNet.},
keywords = {artificial intelligence, CNN, deep learning, face generation, face synthesis, GAN, GAN inversion, kinship, kinship synthesis, StyleGAN2},
pubstate = {published},
tppubtype = {article}
}
Kinship face synthesis is an increasingly popular topic within the computer vision community, particularly the task of predicting the child appearance using parental images. Previous work has been limited in terms of model capacity and inadequate training data, which is comprised of low-resolution and tightly cropped images, leading to lower synthesis quality. In this paper, we propose ChildNet, a method for kinship face synthesis that leverages the facial image generation capabilities of a state-of-the-art Generative Adversarial Network (GAN), and resolves the aforementioned problems. ChildNet is designed within the GAN latent space and is able to predict a child appearance that bears high resemblance to real parents’ children. To ensure fine-grained control, we propose an age and gender manipulation module that allows precise manipulation of the child synthesis result. ChildNet is capable of generating multiple child images per parent pair input, while providing a way to control the image generation variability. Additionally, we introduce a mechanism to control the dominant parent image. Finally, to facilitate the task of kinship face synthesis, we introduce a new kinship dataset, called Next of Kin. This dataset contains 3690 high-resolution face images with a diverse range of ethnicities and ages. We evaluate ChildNet in comprehensive experiments against three competing kinship face synthesis models, using two kinship datasets. The experiments demonstrate the superior performance of ChildNet in terms of identity similarity, while exhibiting high perceptual image quality. The source code for the model is publicly available at: https://github.com/MartinPernus/ChildNet. |
Boutros, Fadi; Štruc, Vitomir; Fierrez, Julian; Damer, Naser Synthetic data for face recognition: Current state and future prospects Journal Article In: Image and Vision Computing, no. 104688, 2023. @article{FadiIVCSynthetic,
title = {Synthetic data for face recognition: Current state and future prospects},
author = {Fadi Boutros and Vitomir Štruc and Julian Fierrez and Naser Damer},
url = {https://www.sciencedirect.com/science/article/pii/S0262885623000628},
doi = {https://doi.org/10.1016/j.imavis.2023.104688},
year = {2023},
date = {2023-05-15},
urldate = {2023-05-15},
journal = {Image and Vision Computing},
number = {104688},
abstract = {Over the past years, deep learning capabilities and the availability of large-scale training datasets advanced rapidly, leading to breakthroughs in face recognition accuracy. However, these technologies are foreseen to face a major challenge in the next years due to the legal and ethical concerns about using authentic biometric data in AI model training and evaluation along with increasingly utilizing data-hungry state-of-the-art deep learning models. With the recent advances in deep generative models and their success in generating realistic and high-resolution synthetic image data, privacy-friendly synthetic data has been recently proposed as an alternative to privacy-sensitive authentic data to overcome the challenges of using authentic data in face recognition development. This work aims at providing a clear and structured picture of the use-cases taxonomy of synthetic face data in face recognition along with the recent emerging advances of face recognition models developed on the bases of synthetic data. We also discuss the challenges facing the use of synthetic data in face recognition development and several future prospects of synthetic data in the domain of face recognition.},
keywords = {biometrics, CNN, diffusion, face recognition, generative models, survey, synthetic data},
pubstate = {published},
tppubtype = {article}
}
Over the past years, deep learning capabilities and the availability of large-scale training datasets advanced rapidly, leading to breakthroughs in face recognition accuracy. However, these technologies are foreseen to face a major challenge in the next years due to the legal and ethical concerns about using authentic biometric data in AI model training and evaluation along with increasingly utilizing data-hungry state-of-the-art deep learning models. With the recent advances in deep generative models and their success in generating realistic and high-resolution synthetic image data, privacy-friendly synthetic data has been recently proposed as an alternative to privacy-sensitive authentic data to overcome the challenges of using authentic data in face recognition development. This work aims at providing a clear and structured picture of the use-cases taxonomy of synthetic face data in face recognition along with the recent emerging advances of face recognition models developed on the bases of synthetic data. We also discuss the challenges facing the use of synthetic data in face recognition development and several future prospects of synthetic data in the domain of face recognition. |
Grabner, Miha; Wang, Yi; Wen, Qingsong; Blažič, Boštjan; Štruc, Vitomir A global modeling framework for load forecasting in distribution networks Journal Article In: IEEE Transactions on Smart Grid, 2023, ISSN: 1949-3061. @article{Grabner_TSG,
title = {A global modeling framework for load forecasting in distribution networks},
author = {Miha Grabner and Yi Wang and Qingsong Wen and Boštjan Blažič and Vitomir Štruc},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10092804},
doi = {10.1109/TSG.2023.3264525},
issn = {1949-3061},
year = {2023},
date = {2023-04-05},
journal = {IEEE Transactions on Smart Grid},
abstract = {With the increasing numbers of smart meter installations, scalable and efficient load forecasting techniques are critically needed to ensure sustainable situation awareness within the distribution networks. Distribution networks include a large amount of different loads at various aggregation levels, such as individual consumers, low-voltage feeders, and transformer stations. It is impractical to develop individual (or so-called local) forecasting models for each load separately. Additionally, such local models also (i) (largely) ignore the strong dependencies between different loads that might be present due to their spatial proximity and the characteristics of the distribution network, (ii) require historical data for each load to be able to make forecasts, and (iii) are incapable of adjusting to changes in the load behavior without retraining. To address these issues, we propose a global modeling framework for load forecasting in distribution networks that, unlike its local competitors, relies on a single global model to generate forecasts for a large number of loads. The global nature of the framework, significantly reduces the computational burden typically required when training multiple local forecasting models, efficiently exploits the cross-series information shared among different loads, and facilitates forecasts even when historical data for a load is missing or the behavior of a load evolves over time. To further improve on the performance of the proposed framework, an unsupervised localization mechanism and optimal ensemble construction strategy are also proposed to localize/personalize the global forecasting model to different load characteristics. Our experimental results show that the proposed framework outperforms naive benchmarks by more than 25% (in terms of Mean Absolute Error) on real-world dataset while exhibiting highly desirable characteristics when compared to the local models that are predominantly used in the literature. All source code and data are made publicly available to enable reproducibility: https://github.com/mihagrabner/GlobalModelingFramework},
keywords = {deep learning, global modeling, load forecasting, prediction, smart grid, time series analysis, time series forecasting},
pubstate = {published},
tppubtype = {article}
}
With the increasing numbers of smart meter installations, scalable and efficient load forecasting techniques are critically needed to ensure sustainable situation awareness within the distribution networks. Distribution networks include a large amount of different loads at various aggregation levels, such as individual consumers, low-voltage feeders, and transformer stations. It is impractical to develop individual (or so-called local) forecasting models for each load separately. Additionally, such local models also (i) (largely) ignore the strong dependencies between different loads that might be present due to their spatial proximity and the characteristics of the distribution network, (ii) require historical data for each load to be able to make forecasts, and (iii) are incapable of adjusting to changes in the load behavior without retraining. To address these issues, we propose a global modeling framework for load forecasting in distribution networks that, unlike its local competitors, relies on a single global model to generate forecasts for a large number of loads. The global nature of the framework, significantly reduces the computational burden typically required when training multiple local forecasting models, efficiently exploits the cross-series information shared among different loads, and facilitates forecasts even when historical data for a load is missing or the behavior of a load evolves over time. To further improve on the performance of the proposed framework, an unsupervised localization mechanism and optimal ensemble construction strategy are also proposed to localize/personalize the global forecasting model to different load characteristics. Our experimental results show that the proposed framework outperforms naive benchmarks by more than 25% (in terms of Mean Absolute Error) on real-world dataset while exhibiting highly desirable characteristics when compared to the local models that are predominantly used in the literature. All source code and data are made publicly available to enable reproducibility: https://github.com/mihagrabner/GlobalModelingFramework |
Meden, Blaž; Gonzalez-Hernandez, Manfred; Peer, Peter; Štruc, Vitomir Face deidentification with controllable privacy protection Journal Article In: Image and Vision Computing, vol. 134, no. 104678, pp. 1-19, 2023. @article{MedenDeID2023,
title = {Face deidentification with controllable privacy protection},
author = {Blaž Meden and Manfred Gonzalez-Hernandez and Peter Peer and Vitomir Štruc},
url = {https://reader.elsevier.com/reader/sd/pii/S0262885623000525?token=BC1E21411C50118E666720B002A89C9EB3DB4CFEEB5EB18D7BD7B0613085030A96621C8364583BFE7BAE025BE3646096&originRegion=eu-west-1&originCreation=20230516115322},
doi = {https://doi.org/10.1016/j.imavis.2023.104678},
year = {2023},
date = {2023-04-01},
journal = {Image and Vision Computing},
volume = {134},
number = {104678},
pages = {1-19},
abstract = {Privacy protection has become a crucial concern in today’s digital age. Particularly sensitive here are facial images, which typically not only reveal a person’s identity, but also other sensitive personal information. To address this problem, various face deidentification techniques have been presented in the literature. These techniques try to remove or obscure personal information from facial images while still preserving their usefulness for further analysis. While a considerable amount of work has been proposed on face deidentification, most state-of-theart solutions still suffer from various drawbacks, and (a) deidentify only a narrow facial area, leaving potentially important contextual information unprotected, (b) modify facial images to such degrees, that image naturalness and facial diversity is suffering in the deidentify images, (c) offer no flexibility in the level of privacy protection ensured, leading to suboptimal deployment in various applications, and (d) often offer an unsatisfactory tradeoff between the ability to obscure identity information, quality and naturalness of the deidentified images, and sufficient utility preservation. In this paper, we address these shortcomings with a novel controllable face deidentification technique that balances image quality, identity protection, and data utility for further analysis. The proposed approach utilizes a powerful generative model (StyleGAN2), multiple auxiliary classification models, and carefully designed constraints to guide the deidentification process. The approach is validated across four diverse datasets (CelebA-HQ, RaFD, XM2VTS, AffectNet) and in comparison to 7 state-of-the-art competitors. The results of the experiments demonstrate that the proposed solution leads to: (a) a considerable level of identity protection, (b) valuable preservation of data utility, (c) sufficient diversity among the deidentified faces, and (d) encouraging overall performance.},
keywords = {CNN, deep learning, deidentification, face recognition, GAN, GAN inversion, privacy, privacy protection, StyleGAN2},
pubstate = {published},
tppubtype = {article}
}
Privacy protection has become a crucial concern in today’s digital age. Particularly sensitive here are facial images, which typically not only reveal a person’s identity, but also other sensitive personal information. To address this problem, various face deidentification techniques have been presented in the literature. These techniques try to remove or obscure personal information from facial images while still preserving their usefulness for further analysis. While a considerable amount of work has been proposed on face deidentification, most state-of-theart solutions still suffer from various drawbacks, and (a) deidentify only a narrow facial area, leaving potentially important contextual information unprotected, (b) modify facial images to such degrees, that image naturalness and facial diversity is suffering in the deidentify images, (c) offer no flexibility in the level of privacy protection ensured, leading to suboptimal deployment in various applications, and (d) often offer an unsatisfactory tradeoff between the ability to obscure identity information, quality and naturalness of the deidentified images, and sufficient utility preservation. In this paper, we address these shortcomings with a novel controllable face deidentification technique that balances image quality, identity protection, and data utility for further analysis. The proposed approach utilizes a powerful generative model (StyleGAN2), multiple auxiliary classification models, and carefully designed constraints to guide the deidentification process. The approach is validated across four diverse datasets (CelebA-HQ, RaFD, XM2VTS, AffectNet) and in comparison to 7 state-of-the-art competitors. The results of the experiments demonstrate that the proposed solution leads to: (a) a considerable level of identity protection, (b) valuable preservation of data utility, (c) sufficient diversity among the deidentified faces, and (d) encouraging overall performance. |
Ivanovska, Marija; Štruc, Vitomir Face Morphing Attack Detection with Denoising Diffusion Probabilistic Models Proceedings Article In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), pp. 1-6, 2023. @inproceedings{IWBF2023_Marija,
title = {Face Morphing Attack Detection with Denoising Diffusion Probabilistic Models},
author = {Marija Ivanovska and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/03/IWBF2023_Morphing.pdf},
year = {2023},
date = {2023-02-28},
booktitle = {Proceedings of the International Workshop on Biometrics and Forensics (IWBF)},
pages = {1-6},
abstract = {Morphed face images have recently become a growing concern for existing face verification systems, as they are relatively easy to generate and can be used to impersonate someone's identity for various malicious purposes. Efficient Morphing Attack Detection (MAD) that generalizes well across different morphing techniques is, therefore, of paramount importance. Existing MAD techniques predominantly rely on discriminative models that learn from examples of bona fide and morphed images and, as a result, often exhibit sub-optimal generalization performance when confronted with unknown types of morphing attacks. To address this problem, we propose a novel, diffusion--based MAD method in this paper that learns only from the characteristics of bona fide images. Various forms of morphing attacks are then detected by our model as out-of-distribution samples. We perform rigorous experiments over four different datasets (CASIA-WebFace, FRLL-Morphs, FERET-Morphs and FRGC-Morphs) and compare the proposed solution to both discriminatively-trained and once-class MAD models. The experimental results show that our MAD model achieves highly competitive results on all considered datasets.},
keywords = {biometrics, deep learning, denoising diffusion probabilistic models, diffusion, face, face morphing attack, morphing attack, morphing attack detection},
pubstate = {published},
tppubtype = {inproceedings}
}
Morphed face images have recently become a growing concern for existing face verification systems, as they are relatively easy to generate and can be used to impersonate someone's identity for various malicious purposes. Efficient Morphing Attack Detection (MAD) that generalizes well across different morphing techniques is, therefore, of paramount importance. Existing MAD techniques predominantly rely on discriminative models that learn from examples of bona fide and morphed images and, as a result, often exhibit sub-optimal generalization performance when confronted with unknown types of morphing attacks. To address this problem, we propose a novel, diffusion--based MAD method in this paper that learns only from the characteristics of bona fide images. Various forms of morphing attacks are then detected by our model as out-of-distribution samples. We perform rigorous experiments over four different datasets (CASIA-WebFace, FRLL-Morphs, FERET-Morphs and FRGC-Morphs) and compare the proposed solution to both discriminatively-trained and once-class MAD models. The experimental results show that our MAD model achieves highly competitive results on all considered datasets. |
Babnik, Žiga; Damer, Naser; Štruc, Vitomir Optimization-Based Improvement of Face Image Quality Assessment Techniques Proceedings Article In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), 2023. @inproceedings{iwbf2023babnik,
title = {Optimization-Based Improvement of Face Image Quality Assessment Techniques},
author = {Žiga Babnik and Naser Damer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/03/IWBF_23___paper-1.pdf},
year = {2023},
date = {2023-02-28},
booktitle = {Proceedings of the International Workshop on Biometrics and Forensics (IWBF)},
abstract = {Contemporary face recognition~(FR) models achieve near-ideal recognition performance in constrained settings, yet do not fully translate the performance to unconstrained (real-world) scenarios. To help improve the performance and stability of FR systems in such unconstrained settings, face image quality assessment (FIQA) techniques try to infer sample-quality information from the input face images that can aid with the recognition process. While existing FIQA techniques are able to efficiently capture the differences between high and low quality images, they typically cannot fully distinguish between images of similar quality, leading to lower performance in many scenarios. To address this issue, we present in this paper a supervised quality-label optimization approach, aimed at improving the performance of existing FIQA techniques. The developed optimization procedure infuses additional information (computed with a selected FR model) into the initial quality scores generated with a given FIQA technique to produce better estimates of the ``actual'' image quality. We evaluate the proposed approach in comprehensive experiments with six state-of-the-art FIQA approaches (CR-FIQA, FaceQAN, SER-FIQ, PCNet, MagFace, SER-FIQ) on five commonly used benchmarks (LFW, CFP-FP, CPLFW, CALFW, XQLFW) using three targeted FR models (ArcFace, ElasticFace, CurricularFace) with highly encouraging results. },
keywords = {distillation, face, face image quality assessment, face image quality estimation, face images, optimization, quality, transfer learning},
pubstate = {published},
tppubtype = {inproceedings}
}
Contemporary face recognition~(FR) models achieve near-ideal recognition performance in constrained settings, yet do not fully translate the performance to unconstrained (real-world) scenarios. To help improve the performance and stability of FR systems in such unconstrained settings, face image quality assessment (FIQA) techniques try to infer sample-quality information from the input face images that can aid with the recognition process. While existing FIQA techniques are able to efficiently capture the differences between high and low quality images, they typically cannot fully distinguish between images of similar quality, leading to lower performance in many scenarios. To address this issue, we present in this paper a supervised quality-label optimization approach, aimed at improving the performance of existing FIQA techniques. The developed optimization procedure infuses additional information (computed with a selected FR model) into the initial quality scores generated with a given FIQA technique to produce better estimates of the ``actual'' image quality. We evaluate the proposed approach in comprehensive experiments with six state-of-the-art FIQA approaches (CR-FIQA, FaceQAN, SER-FIQ, PCNet, MagFace, SER-FIQ) on five commonly used benchmarks (LFW, CFP-FP, CPLFW, CALFW, XQLFW) using three targeted FR models (ArcFace, ElasticFace, CurricularFace) with highly encouraging results. |
Vitek, Matej; Das, Abhijit; Lucio, Diego Rafael; Jr., Luiz Antonio Zanlorensi; Menotti, David; Khiarak, Jalil Nourmohammadi; Shahpar, Mohsen Akbari; Asgari-Chenaghlu, Meysam; Jaryani, Farhang; Tapia, Juan E.; Valenzuela, Andres; Wang, Caiyong; Wang, Yunlong; He, Zhaofeng; Sun, Zhenan; Boutros, Fadi; Damer, Naser; Grebe, Jonas Henry; Kuijper, Arjan; Raja, Kiran; Gupta, Gourav; Zampoukis, Georgios; Tsochatzidis, Lazaros; Pratikakis, Ioannis; Kumar, S. V. Aruna; Harish, B. S.; Pal, Umapada; Peer, Peter; Štruc, Vitomir Exploring Bias in Sclera Segmentation Models: A Group Evaluation Approach Journal Article In: IEEE Transactions on Information Forensics and Security, vol. 18, pp. 190-205, 2023, ISSN: 1556-6013. @article{TIFS_Sclera2022,
title = {Exploring Bias in Sclera Segmentation Models: A Group Evaluation Approach},
author = {Matej Vitek and Abhijit Das and Diego Rafael Lucio and Luiz Antonio Zanlorensi Jr. and David Menotti and Jalil Nourmohammadi Khiarak and Mohsen Akbari Shahpar and Meysam Asgari-Chenaghlu and Farhang Jaryani and Juan E. Tapia and Andres Valenzuela and Caiyong Wang and Yunlong Wang and Zhaofeng He and Zhenan Sun and Fadi Boutros and Naser Damer and Jonas Henry Grebe and Arjan Kuijper and Kiran Raja and Gourav Gupta and Georgios Zampoukis and Lazaros Tsochatzidis and Ioannis Pratikakis and S. V. Aruna Kumar and B. S. Harish and Umapada Pal and Peter Peer and Vitomir Štruc},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9926136},
doi = {10.1109/TIFS.2022.3216468},
issn = {1556-6013},
year = {2023},
date = {2023-01-18},
urldate = {2022-10-18},
journal = {IEEE Transactions on Information Forensics and Security},
volume = {18},
pages = {190-205},
abstract = {Bias and fairness of biometric algorithms have been key topics of research in recent years, mainly due to the societal, legal and ethical implications of potentially unfair decisions made by automated decision-making models. A considerable amount of work has been done on this topic across different biometric modalities, aiming at better understanding the main sources of algorithmic bias or devising mitigation measures. In this work, we contribute to these efforts and present the first study investigating bias and fairness of sclera segmentation models. Although sclera segmentation techniques represent a key component of sclera-based biometric systems with a considerable impact on the overall recognition performance, the presence of different types of biases in sclera segmentation methods is still underexplored. To address this limitation, we describe the results of a group evaluation effort (involving seven research groups), organized to explore the performance of recent sclera segmentation models within a common experimental framework and study performance differences (and bias), originating from various demographic as well as environmental factors. Using five diverse datasets, we analyze seven independently developed sclera segmentation models in different experimental configurations. The results of our experiments suggest that there are significant differences in the overall segmentation performance across the seven models and that among the considered factors, ethnicity appears to be the biggest cause of bias. Additionally, we observe that training with representative and balanced data does not necessarily lead to less biased results. Finally, we find that in general there appears to be a negative correlation between the amount of bias observed (due to eye color, ethnicity and acquisition device) and the overall segmentation performance, suggesting that advances in the field of semantic segmentation may also help with mitigating bias.},
keywords = {bias, biometrics, fairness, group evaluation, ocular, sclera, sclera segmentation, segmentation},
pubstate = {published},
tppubtype = {article}
}
Bias and fairness of biometric algorithms have been key topics of research in recent years, mainly due to the societal, legal and ethical implications of potentially unfair decisions made by automated decision-making models. A considerable amount of work has been done on this topic across different biometric modalities, aiming at better understanding the main sources of algorithmic bias or devising mitigation measures. In this work, we contribute to these efforts and present the first study investigating bias and fairness of sclera segmentation models. Although sclera segmentation techniques represent a key component of sclera-based biometric systems with a considerable impact on the overall recognition performance, the presence of different types of biases in sclera segmentation methods is still underexplored. To address this limitation, we describe the results of a group evaluation effort (involving seven research groups), organized to explore the performance of recent sclera segmentation models within a common experimental framework and study performance differences (and bias), originating from various demographic as well as environmental factors. Using five diverse datasets, we analyze seven independently developed sclera segmentation models in different experimental configurations. The results of our experiments suggest that there are significant differences in the overall segmentation performance across the seven models and that among the considered factors, ethnicity appears to be the biggest cause of bias. Additionally, we observe that training with representative and balanced data does not necessarily lead to less biased results. Finally, we find that in general there appears to be a negative correlation between the amount of bias observed (due to eye color, ethnicity and acquisition device) and the overall segmentation performance, suggesting that advances in the field of semantic segmentation may also help with mitigating bias. |
Grm, Klemen; Ozata, Berk; Struc, Vitomir; Ekenel, Hazim K. Meet-in-the-middle: Multi-scale upsampling and matching for cross-resolution face recognition Proceedings Article In: WACV workshops, pp. 120-129, 2023. @inproceedings{WACVW2023,
title = {Meet-in-the-middle: Multi-scale upsampling and matching for cross-resolution face recognition},
author = {Klemen Grm and Berk Ozata and Vitomir Struc and Hazim K. Ekenel},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/01/Meet_in_the_middle.pdf
https://arxiv.org/abs/2211.15225
https://openaccess.thecvf.com/content/WACV2023W/RWS/papers/Grm_Meet-in-the-Middle_Multi-Scale_Upsampling_and_Matching_for_Cross-Resolution_Face_Recognition_WACVW_2023_paper.pdf
},
year = {2023},
date = {2023-01-06},
booktitle = {WACV workshops},
pages = {120-129},
abstract = {In this paper, we aim to address the large domain gap between high-resolution face images, e.g., from professional portrait photography, and low-quality surveillance images, e.g., from security cameras. Establishing an identity match between disparate sources like this is a classical surveillance face identification scenario, which continues to be a challenging problem for modern face recognition techniques. To that end, we propose a method that combines face super-resolution, resolution matching, and multi-scale template accumulation to reliably recognize faces from long-range surveillance footage, including from low quality sources. The proposed approach does not require training or fine-tuning on the target dataset of real surveillance images. Extensive experiments show that our proposed method is able to outperform even existing methods fine-tuned to the SCFace dataset.},
keywords = {deep learning, face, face recognition, multi-scale matching, smart surveillance, surveillance, surveillance technology},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper, we aim to address the large domain gap between high-resolution face images, e.g., from professional portrait photography, and low-quality surveillance images, e.g., from security cameras. Establishing an identity match between disparate sources like this is a classical surveillance face identification scenario, which continues to be a challenging problem for modern face recognition techniques. To that end, we propose a method that combines face super-resolution, resolution matching, and multi-scale template accumulation to reliably recognize faces from long-range surveillance footage, including from low quality sources. The proposed approach does not require training or fine-tuning on the target dataset of real surveillance images. Extensive experiments show that our proposed method is able to outperform even existing methods fine-tuned to the SCFace dataset. |
Eyiokur, Fevziye Irem; Kantarci, Alperen; Erakin, Mustafa Ekrem; Damer, Naser; Ofli, Ferda; Imran, Muhammad; Križaj, Janez; Salah, Albert Ali; Waibel, Alexander; Štruc, Vitomir; Ekenel, Hazim K. A Survey on Computer Vision based Human Analysis in the COVID-19 Era Journal Article In: Image and Vision Computing, vol. 130, no. 104610, pp. 1-19, 2023. @article{IVC2023,
title = {A Survey on Computer Vision based Human Analysis in the COVID-19 Era},
author = {Fevziye Irem Eyiokur and Alperen Kantarci and Mustafa Ekrem Erakin and Naser Damer and Ferda Ofli and Muhammad Imran and Janez Križaj and Albert Ali Salah and Alexander Waibel and Vitomir Štruc and Hazim K. Ekenel },
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/01/FG4COVID19_PAPER_compressed.pdf
https://authors.elsevier.com/a/1gKOyxnVK7RBS},
doi = {https://doi.org/10.1016/j.imavis.2022.104610},
year = {2023},
date = {2023-01-01},
journal = {Image and Vision Computing},
volume = {130},
number = {104610},
pages = {1-19},
abstract = {The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including
face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks.
Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given at the end of the survey. This work is intended to have a broad appeal and be useful not only for computer vision researchers but also the general public.},
keywords = {COVID-19, face, face alignment, face analysis, face image processing, face image quality assessment, face landmarking, face recognition, face verification, human analysis, masked face analysis},
pubstate = {published},
tppubtype = {article}
}
The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including
face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks.
Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given at the end of the survey. This work is intended to have a broad appeal and be useful not only for computer vision researchers but also the general public. |
Hrovatič, Anja; Peer, Peter; Štruc, Vitomir; Emeršič, Žiga Efficient ear alignment using a two-stack hourglass network Journal Article In: IET Biometrics , pp. 1-14, 2023, ISSN: 2047-4938. @article{UhljiIETZiga,
title = {Efficient ear alignment using a two-stack hourglass network},
author = {Anja Hrovatič and Peter Peer and Vitomir Štruc and Žiga Emeršič},
url = {https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2.12109},
doi = {10.1049/bme2.12109},
issn = {2047-4938},
year = {2023},
date = {2023-01-01},
journal = {IET Biometrics },
pages = {1-14},
abstract = {Ear images have been shown to be a reliable modality for biometric recognition with desirable characteristics, such as high universality, distinctiveness, measurability and permanence. While a considerable amount of research has been directed towards ear recognition techniques, the problem of ear alignment is still under-explored in the open literature. Nonetheless, accurate alignment of ear images, especially in unconstrained acquisition scenarios, where the ear appearance is expected to vary widely due to pose and view point variations, is critical for the performance of all downstream tasks, including ear recognition. Here, the authors address this problem and present a framework for ear alignment that relies on a two-step procedure: (i) automatic landmark detection and (ii) fiducial point alignment. For the first (landmark detection) step, the authors implement and train a Two-Stack Hourglass model (2-SHGNet) capable of accurately predicting 55 landmarks on diverse ear images captured in uncontrolled conditions. For the second (alignment) step, the authors use the Random Sample Consensus (RANSAC) algorithm to align the estimated landmark/fiducial points with a pre-defined ear shape (i.e. a collection of average ear landmark positions). The authors evaluate the proposed framework in comprehensive experiments on the AWEx and ITWE datasets and show that the 2-SHGNet model leads to more accurate landmark predictions than competing state-of-the-art models from the literature. Furthermore, the authors also demonstrate that the alignment step significantly improves recognition accuracy with ear images from unconstrained environments compared to unaligned imagery.},
keywords = {biometrics, CNN, deep learning, ear, ear alignment, ear recognition},
pubstate = {published},
tppubtype = {article}
}
Ear images have been shown to be a reliable modality for biometric recognition with desirable characteristics, such as high universality, distinctiveness, measurability and permanence. While a considerable amount of research has been directed towards ear recognition techniques, the problem of ear alignment is still under-explored in the open literature. Nonetheless, accurate alignment of ear images, especially in unconstrained acquisition scenarios, where the ear appearance is expected to vary widely due to pose and view point variations, is critical for the performance of all downstream tasks, including ear recognition. Here, the authors address this problem and present a framework for ear alignment that relies on a two-step procedure: (i) automatic landmark detection and (ii) fiducial point alignment. For the first (landmark detection) step, the authors implement and train a Two-Stack Hourglass model (2-SHGNet) capable of accurately predicting 55 landmarks on diverse ear images captured in uncontrolled conditions. For the second (alignment) step, the authors use the Random Sample Consensus (RANSAC) algorithm to align the estimated landmark/fiducial points with a pre-defined ear shape (i.e. a collection of average ear landmark positions). The authors evaluate the proposed framework in comprehensive experiments on the AWEx and ITWE datasets and show that the 2-SHGNet model leads to more accurate landmark predictions than competing state-of-the-art models from the literature. Furthermore, the authors also demonstrate that the alignment step significantly improves recognition accuracy with ear images from unconstrained environments compared to unaligned imagery. |
2022
|
Gan, Chenquan; Yang, Yucheng; Zhub, Qingyi; Jain, Deepak Kumar; Struc, Vitomir DHF-Net: A hierarchical feature interactive fusion network for dialogue emotion recognition Journal Article In: Expert Systems with Applications, vol. 210, 2022. @article{TextEmotionESWA,
title = {DHF-Net: A hierarchical feature interactive fusion network for dialogue emotion recognition},
author = {Chenquan Gan and Yucheng Yang and Qingyi Zhub and Deepak Kumar Jain and Vitomir Struc},
url = {https://www.sciencedirect.com/science/article/pii/S0957417422016025?via%3Dihub},
doi = {https://doi.org/10.1016/j.eswa.2022.118525},
year = {2022},
date = {2022-12-30},
urldate = {2022-08-01},
journal = {Expert Systems with Applications},
volume = {210},
abstract = {To balance the trade-off between contextual information and fine-grained information in identifying specific emotions during a dialogue and combine the interaction of hierarchical feature related information, this paper proposes a hierarchical feature interactive fusion network (named DHF-Net), which not only can retain the integrity of the context sequence information but also can extract more fine-grained information. To obtain a deep semantic information, DHF-Net processes the task of recognizing dialogue emotion and dialogue act/intent separately, and then learns the cross-impact of two tasks through collaborative attention. Also, a bidirectional gate recurrent unit (Bi-GRU) connected hybrid convolutional neural network (CNN) group method is designed, by which the sequence information is smoothly sent to the multi-level local information layers for feature exaction. Experimental results show that, on two open session datasets, the performance of DHF-Net is improved by 1.8% and 1.2%, respectively.},
keywords = {attention, CNN, deep learning, dialogue, emotion recognition, fusion, fusion network, nlp, semantics, text, text processing},
pubstate = {published},
tppubtype = {article}
}
To balance the trade-off between contextual information and fine-grained information in identifying specific emotions during a dialogue and combine the interaction of hierarchical feature related information, this paper proposes a hierarchical feature interactive fusion network (named DHF-Net), which not only can retain the integrity of the context sequence information but also can extract more fine-grained information. To obtain a deep semantic information, DHF-Net processes the task of recognizing dialogue emotion and dialogue act/intent separately, and then learns the cross-impact of two tasks through collaborative attention. Also, a bidirectional gate recurrent unit (Bi-GRU) connected hybrid convolutional neural network (CNN) group method is designed, by which the sequence information is smoothly sent to the multi-level local information layers for feature exaction. Experimental results show that, on two open session datasets, the performance of DHF-Net is improved by 1.8% and 1.2%, respectively. |
Tomašević, Darian; Peer, Peter; Štruc, Vitomir BiOcularGAN: Bimodal Synthesis and Annotation of Ocular Images Proceedings Article In: IEEE/IAPR International Joint Conference on Biometrics (IJCB 2022) , pp. 1-10, 2022. @inproceedings{TomasevicIJCBBiOcular,
title = {BiOcularGAN: Bimodal Synthesis and Annotation of Ocular Images},
author = {Darian Tomašević and Peter Peer and Vitomir Štruc },
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/12/BiModal_StyleGAN.pdf
https://arxiv.org/pdf/2205.01536.pdf},
year = {2022},
date = {2022-10-20},
urldate = {2022-10-20},
booktitle = {IEEE/IAPR International Joint Conference on Biometrics (IJCB 2022) },
pages = {1-10},
abstract = {Current state-of-the-art segmentation techniques for ocular images are critically dependent on large-scale annotated datasets, which are labor-intensive to gather and often raise privacy concerns. In this paper, we present a novel framework, called BiOcularGAN, capable of generating synthetic large-scale datasets of photorealistic (visible light and near-infrared) ocular images, together with corresponding segmentation labels to address these issues. At its core, the framework relies on a novel Dual-Branch StyleGAN2 (DB-StyleGAN2) model that facilitates bimodal image generation, and a Semantic Mask Generator (SMG) component that produces semantic annotations by exploiting latent features of the DB-StyleGAN2 model. We evaluate BiOcularGAN through extensive experiments across five diverse ocular datasets and analyze the effects of bimodal data generation on image quality and the produced annotations. Our experimental results show that BiOcularGAN is able to produce high-quality matching bimodal images and annotations (with minimal manual intervention) that can be used to train highly competitive (deep) segmentation models (in a privacy aware-manner) that perform well across multiple real-world datasets. The source code for the BiOcularGAN framework is publicly available at: https://github.com/dariant/BiOcularGAN.},
keywords = {biometrics, CNN, data synthesis, deep learning, ocular, segmentation, StyleGAN, synthetic data},
pubstate = {published},
tppubtype = {inproceedings}
}
Current state-of-the-art segmentation techniques for ocular images are critically dependent on large-scale annotated datasets, which are labor-intensive to gather and often raise privacy concerns. In this paper, we present a novel framework, called BiOcularGAN, capable of generating synthetic large-scale datasets of photorealistic (visible light and near-infrared) ocular images, together with corresponding segmentation labels to address these issues. At its core, the framework relies on a novel Dual-Branch StyleGAN2 (DB-StyleGAN2) model that facilitates bimodal image generation, and a Semantic Mask Generator (SMG) component that produces semantic annotations by exploiting latent features of the DB-StyleGAN2 model. We evaluate BiOcularGAN through extensive experiments across five diverse ocular datasets and analyze the effects of bimodal data generation on image quality and the produced annotations. Our experimental results show that BiOcularGAN is able to produce high-quality matching bimodal images and annotations (with minimal manual intervention) that can be used to train highly competitive (deep) segmentation models (in a privacy aware-manner) that perform well across multiple real-world datasets. The source code for the BiOcularGAN framework is publicly available at: https://github.com/dariant/BiOcularGAN. |
Huber, Marco; Boutros, Fadi; Luu, Anh Thi; Raja, Kiran; Ramachandra, Raghavendra; Damer, Naser; Neto, Pedro C.; Goncalves, Tiago; Sequeira, Ana F.; Cardoso, Jaime S.; Tremoco, João; Lourenco, Miguel; Serra, Sergio; Cermeno, Eduardo; Ivanovska, Marija; Batagelj, Borut; Kronovšek, Andrej; Peer, Peter; Štruc, Vitomir SYN-MAD 2022: Competition on Face Morphing Attack Detection based on Privacy-aware Synthetic Training Data Proceedings Article In: IEEE International Joint Conference on Biometrics (IJCB), pp. 1-10, 2022, ISBN: 978-1-6654-6394-2. @inproceedings{IvanovskaSYNMAD,
title = {SYN-MAD 2022: Competition on Face Morphing Attack Detection based on Privacy-aware Synthetic Training Data},
author = {Marco Huber and Fadi Boutros and Anh Thi Luu and Kiran Raja and Raghavendra Ramachandra and Naser Damer and Pedro C. Neto and Tiago Goncalves and Ana F. Sequeira and Jaime S. Cardoso and João Tremoco and Miguel Lourenco and Sergio Serra and Eduardo Cermeno and Marija Ivanovska and Borut Batagelj and Andrej Kronovšek and Peter Peer and Vitomir Štruc},
url = {https://ieeexplore.ieee.org/iel7/10007927/10007928/10007950.pdf?casa_token=k7CV1Vs4DUsAAAAA:xMvzvPAyLBoPv1PqtJQTmZQ9S3TJOlExgcxOeuZPNEuVFKVuIfofx30CgN-jnhVB8_5o_Ne3nJLB},
doi = {10.1109/IJCB54206.2022.10007950},
isbn = {978-1-6654-6394-2},
year = {2022},
date = {2022-09-01},
urldate = {2022-09-01},
booktitle = {IEEE International Joint Conference on Biometrics (IJCB)},
pages = {1-10},
keywords = {data synthesis, deep learning, face, face PAD, pad, synthetic data},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Ivanovska, Marija; Kronovšek, Andrej; Peer, Peter; Štruc, Vitomir; Batagelj, Borut Face Morphing Attack Detection Using Privacy-Aware Training Data Proceedings Article In: Proceedings of ERK 2022, pp. 1-4, 2022. @inproceedings{MarijaMorphing,
title = {Face Morphing Attack Detection Using Privacy-Aware Training Data},
author = {Marija Ivanovska and Andrej Kronovšek and Peter Peer and Vitomir Štruc and Borut Batagelj },
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/08/2022_ERK__Face_Morphing_Attack_Detecton_Using_Privacy_Aware_Training_Data.pdf},
year = {2022},
date = {2022-08-01},
urldate = {2022-08-01},
booktitle = {Proceedings of ERK 2022},
pages = {1-4},
abstract = {Images of morphed faces pose a serious threat to face recognition--based security systems, as they can be used to illegally verify the identity of multiple people with a single morphed image. Modern detection algorithms learn to identify such morphing attacks using authentic images of real individuals. This approach raises various privacy concerns and limits the amount of publicly available training data. In this paper, we explore the efficacy of detection algorithms that are trained only on faces of non--existing people and their respective morphs. To this end, two dedicated algorithms are trained with synthetic data and then evaluated on three real-world datasets, i.e.: FRLL-Morphs, FERET-Morphs and FRGC-Morphs. Our results show that synthetic facial images can be successfully employed for the training process of the detection algorithms and generalize well to real-world scenarios.},
keywords = {competition, face, face morphing, face morphing attack, face morphing detection, private data, synthetic data},
pubstate = {published},
tppubtype = {inproceedings}
}
Images of morphed faces pose a serious threat to face recognition--based security systems, as they can be used to illegally verify the identity of multiple people with a single morphed image. Modern detection algorithms learn to identify such morphing attacks using authentic images of real individuals. This approach raises various privacy concerns and limits the amount of publicly available training data. In this paper, we explore the efficacy of detection algorithms that are trained only on faces of non--existing people and their respective morphs. To this end, two dedicated algorithms are trained with synthetic data and then evaluated on three real-world datasets, i.e.: FRLL-Morphs, FERET-Morphs and FRGC-Morphs. Our results show that synthetic facial images can be successfully employed for the training process of the detection algorithms and generalize well to real-world scenarios. |