Publications – Laboratory for Machine Intelligence

Babnik, Žiga; Peer, Peter; Štruc, Vitomir

UVFace: Utility Driven Video-based Face Recognition Journal Article

In: ICT Express, pp. 1–6, 2026.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, face image quality assessment, face images, face recognition, video based recognition

Kolf, Jan Niklas; Ozgur, Guray; Atzori, Andrea; Babnik, Žiga; Štruc, Vitomir; Damer, Naser; Boutros, Fadi

PreFIQs: Face Image Quality Is What Survives Pruning Proceedings Article

In: Proceedings of CVPR Workshops 2026 - CVPR Biometrics Workshop, pp. 1–11, 2026.

Abstract | Links | BibTeX | Tags: biometrics, deep learning, face image quality assessment, face quality, face recognition, FIQA

@inproceedings{PreFIQCVPRW,

title = {PreFIQs: Face Image Quality Is What Survives Pruning},

author = {Jan Niklas Kolf and Guray Ozgur and Andrea Atzori and Žiga Babnik and Vitomir Štruc and Naser Damer and Fadi Boutros},

url = {https://openaccess.thecvf.com/content/CVPR2026W/BIOM2026/papers/Kolf_PreFIQs_Face_Image_Quality_Is_What_Survives_Pruning_CVPRW_2026_paper.pdf},

year  = {2026},

date = {2026-06-06},

booktitle = {Proceedings of CVPR Workshops 2026 - CVPR Biometrics Workshop},

pages = {1--11},

abstract = {Face Image Quality Assessment (FIQA) evaluates the utility of a face image for automated face recognition (FR) systems. In this work, we propose PreFIQs, an unsupervised and training-free FIQA framework grounded in the Pruning Identified Exemplar (PIE) hypothesis. We hypothesize that low-utility face images rely disproportionately on fragile network parameters, resulting in larger geometric displacement of their embeddings under model sparsification. Accordingly, PreFIQs quantifies image utility as the Euclidean distance between L2-normalized embeddings extracted from a pre-trained FR model and its pruned counterpart. We provide a first-order theoretical justification via a Jacobian-vector product analysis, demonstrating that this empirical drift serves as a computationally efficient approximation of the exact geometric sensitivity of the latent embedding manifold. Extensive experiments across eight benchmarks and four FR models demonstrate that PreFIQs achieves competitive or superior performance compared to state-of-the-art FIQA methods, including establishing new state-of-the-art results on several benchmarks, without any training or supervision. These results validate parameter sparsification as a principled and practically efficient signal for face image utility, and demonstrate that quality is, in essence, what survives pruning. Code available at https://github.com/jankolf/PreFIQs.},

keywords = {biometrics, deep learning, face image quality assessment, face quality, face recognition, FIQA},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Larue, Nicolas; Štruc, Vitomir; Peer, Peter; Vu, Ngoc-Son

Learning the Manifold of Authenticity: Hybrid-Curvature Representation Learning for Generalizable Deepfake Detection Journal Article

In: IEEE Access, pp. 1–14, 2026, ISBN: 2169-3536.

Abstract | Links | BibTeX | Tags: deep learning, deepfake, deepfake DAD, deepfake detection, hyperbolic learning, media forensics

@article{AccessHyperbolic,

title = {Learning the Manifold of Authenticity: Hybrid-Curvature Representation Learning for Generalizable Deepfake Detection},

author = {Nicolas Larue and Vitomir Štruc and Peter Peer and Ngoc-Son Vu},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11557307},

doi = {10.1109/ACCESS.2026.3702429},

isbn = {2169-3536},

year  = {2026},

date = {2026-06-06},

journal = {IEEE Access},

pages = {1--14},

abstract = {The practical utility of deepfake detectors is crippled by a crisis of generalization: models that perform well on known manipulation techniques consistently fail when faced with unseen forgeries.We argue this failure stems from a fundamental geometric mismatch. Existing methods implicitly assume that the manifold of authentic faces can be modeled in a space of uniform curvature, typically Euclidean, which inade-quately captures the complex, multi-scale structure of facial features. This paper validates the hypothesis that authentic faces lie on a manifold whose geometry is inherently hybrid, requiring both angular compactness (a spherical property) and hierarchical organization (a hyperbolic property). To resolve this geometric mismatch, we introduce a novel detector, CTrue, that learns a unified, hybrid-curvature representation of facial authenticity. Trained exclusively on real faces via self-supervised learning, our method simultaneously projects facial embeddings onto two complementary manifolds: a hypersphere to enforce compactness and a hyperbolic space to model the natural feature hierarchy. A single set of mathematically-optimal prototypes acts as a ‘‘geometric bridge’’, unifying the learning objectives in both spaces. At inference, a composite score measures an embedding’s deviation from this learned manifold. On challenging cross-dataset and cross-manipulation benchmarks, our method achieves competitive generalization under a strictly pristine-only training setting, showing that hybrid-curvature representations provide an effective and data-efficient alternative for deepfake detection.},

keywords = {deep learning, deepfake, deepfake DAD, deepfake detection, hyperbolic learning, media forensics},

pubstate = {published},

tppubtype = {article}

}

Close

Sarıtas, Erdi; Onaran, Eren; Štruc, Vitomir; Ekenel, Hazım Kemal

Employing Vision-Language Models for Face Image Quality Assessment Proceedings Article

In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 1–9, 2026.

Abstract | Links | BibTeX | Tags: biometrics, face recognition, FIQA, llm, vlm

@inproceedings{Erdi_fg2026,

title = {Employing Vision-Language Models for Face Image Quality Assessment},

author = {Erdi Sarıtas and Eren Onaran and Vitomir Štruc and Hazım Kemal Ekenel},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2026/04/VLM_FIQA_FG26__camera_ready__compressed.pdf},

year  = {2026},

date = {2026-05-25},

urldate = {2026-05-25},

booktitle = {Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG)},

pages = {1--9},

abstract = {Face Image Quality Assessment (FIQA) is a crucial control step in biometric pipelines. It ensures only reliable samples are processed to maintain system accuracy. State-of-the-art FIQA methods achieve high utility but typically operate as ”black boxes.” They produce scalar scores without humaninterpretable justifications. This lack of transparency limits their effectiveness in human-in-the-loop scenarios, such as automated border control, where actionable feedback is essential. In this paper, we investigate the potential of off-the-shelf Vision-Language Models (VLMs) to bridge this gap by performing FIQA in a zero-shot setting. We present a comprehensive evaluation framework for assessing VLM performance. This involves benchmarking traditional FIQA methods through error-versus reject curves. Additionally, using a diverse set of datasets, ranging from surveillance-oriented to synthetically generated, we analyzed their interpretability, consistency, and robustness to prompt changes. Our results show biometric utility performance depends significantly on architecture, not merely on parameter count. Most VLMs’ outputs align with those of traditional methods. We also find that VLM ranking performance and the generated scores may vary across prompts. Our synthetic ablation study shows that while increasing the parameter count can improve internal consistency, it yields worse degradation-detection performance than smaller models. These findings suggest that zero-shot FIQA score estimation using VLMs is promising and could effectively complement conventional FIQA pipelines as an interpretability module. The codes are available at github.com/ThEnded32/VLM4FIQA.git.},

keywords = {biometrics, face recognition, FIQA, llm, vlm},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Sabadin, Jernej; Tomašević, Darian; Meden, Blaž; Peer, Peter; Štruc, Vitomir

IDSync: Improving Diffusion Models Through Identity Classification Proceedings Article

In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 1–10, 2026.

Abstract | Links | BibTeX | Tags: biometrics, data synthesis, face generation, face synthesis, generative AI, generative models

@inproceedings{JernejFG2026,

title = {IDSync: Improving Diffusion Models Through Identity Classification},

author = {Jernej Sabadin and Darian Tomašević and Blaž Meden and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2026/04/IDSync_FG_2026_compressed.pdf},

year  = {2026},

date = {2026-05-25},

urldate = {2026-05-25},

booktitle = {Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG)},

pages = {1--10},

abstract = {Effective training of face recognition models requires large-scale datasets of facial identities, yet collecting suitable data is time-consuming and raises privacy concerns. Existing deep generative models offer a promising alternative through the synthesis of high-quality images but often fail to fully preserve identity information. In this work, we propose IDSync, a novel generative diffusion-based framework designed to produce synthetic face images with more consistent identities that are better suited for training recognition models. To this end, IDSync employs a denoising network in the latent space of a frozen variational autoencoder, with identity guidance introduced via a text encoder that interprets identity embeddings from a pretrained recognition model. During training, the framework leverages a pretrained auxiliary identity classifier to define an additional cross-entropy loss, which is backpropagated to improve identity consistency. We evaluate the generated images using inter- and intra-class cosine similarity of identity features along with a variety of statistical measures between synthetic and real distributions focused on fidelity and diversity. To assess utility, we train face recognition models on the synthetic images and measure accuracy on standard verification benchmarks. Experimental results show that recognition models trained on IDSync-generated data achieve higher verification accuracies on real-world benchmarks than models trained on synthetic data produced by competing generative models. The IDSync source code is publicly available at url{https://github.com/JSabadin/IDSync}.},

keywords = {biometrics, data synthesis, face generation, face synthesis, generative AI, generative models},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Babnik, Žiga; Boutros, Fadi; Damer, Naser; Jain, Deepak Kumar; Peer, Peter; Štruc, Vitomir

FunFace: Feature Utility and Norm Estimation for Face Recognition Proceedings Article

In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–10, 2026.

Abstract | Links | BibTeX | Tags: face image processing, face image quality assessment, face image quality estimation, face images, face recognition

@inproceedings{FG2026_FunFace,

title = {FunFace: Feature Utility and Norm Estimation for Face Recognition},

author = {Žiga Babnik and Fadi Boutros and Naser Damer and Deepak Kumar Jain and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2026/04/qFR_paper.pdf},

year  = {2026},

date = {2026-05-24},

urldate = {2026-05-24},

booktitle = {Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition},

pages = {1--10},

abstract = {Face Recognition (FR) is used in a variety of application domains, from entertainment and banking to security, and surveillance. Such applications rely on the FR model to be robust and perform well in a variety of settings. To achieve this, state-of-the-art FR models typically use expressive adaptive margin loss functions, which tie the feature norm to concepts related to sample quality, such as recognizability and perceptual image quality. Recently, through the development of Face Image Quality Assessment (FIQA) techniques, biometric utility has become the preferred measure of face-image quality and has been shown to be a better predictor of the usefulness of samples for face recognition compared to more human-centric aspects, such as resolution, blur, and lighting, tied to general image quality. While image quality expressed through feature norms exhibits a certain level of correlation with biometric utility, it does not fully encapsulate all aspects of utility. To address this point, we propose a new adaptive margin loss, FunFace (Face Recognition Through Utility and Norm Estimation), which incorporates biometric utility, estimated by the Certainty Ratio, into the adaptive margin, taking inspiration from AdaFace. We show that FunFace (when used to train a face recognition model) achieves competitive results to other state-of-the-art FR models on benchmarks containing high-quality samples, while surpassing them on low quality benchmarks.},

keywords = {face image processing, face image quality assessment, face image quality estimation, face images, face recognition},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Brodarič, Marko; Ivanovska, Marija; Jain, Deepak Kumar; Peer, Peter; Štruc, Vitomir

HCSI-Net: Hierarchical Cross-Stream Interaction for Generalizable Deepfake Detection Proceedings Article

In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), pp. 1–6, 2026.

Abstract | Links | BibTeX | Tags: CNN, deepfake, deepfake detection, deepfakes, transformer

Lajić, Romanela; Peer, Peter; Štruc, Vitomir; Han, Dong Seog; Meden, Blaž; Emeršič, Žiga

FACES: Facial Analysis with Compressed Efficient Systems   Journal Article

In: ICT Express, 2026.

Abstract | Links | BibTeX | Tags: biometrics, distillation, face recognition, knowledge distillation

Tomaševi, Darian; Peer, Peter; Štruc, Vitomir; Miočić, Matej

Diff-FIT: Generating Facial Composites with Diffusion Models Journal Article

In: IEEE Access, 2026, ISBN: 2169-3536.

Abstract | Links | BibTeX | Tags: diffusion, face, face composite, face editing, face recognition

@article{Acces2026,

title = {Diff-FIT: Generating Facial Composites with Diffusion Models},

author = {Darian Tomaševi and Peter Peer and Vitomir Štruc and Matej Miočić},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11426945},

doi = {10.1109/ACCESS.2026.3672229},

isbn = {2169-3536},

year  = {2026},

date = {2026-03-09},

journal = {IEEE Access},

abstract = {Facial composites, or police sketches, are essential tools in law enforcement for reconstructing the appearance of suspects from eyewitness descriptions. Traditionally, forensic artists manually produce these composites through intensive and time-consuming collaboration with witnesses. To improve the efficiency of this process, automated approaches have leveraged advancements in deep learning and generative modeling. However, the application of recent diffusion models has remained unexplored, despite their unparalleled text-guided synthesis capabilities. To this end, we present Diff-FIT (Diffusion Facial Identification Technique), a novel multi-pipeline framework for generating photorealistic facial composites in only a few steps with pretrained diffusion models. Diff-FIT enables rapid generation of initial composites from textual descriptions, followed by intuitive sequential edits including global image-to-image translation, local text-based inpainting, and drag-based geometric transformations. Through experiments across multiple latent diffusion models and sampling parameters we determine the configuration that best balances image quality, diversity, image-text alignment, and identity consistency. In a user study involving biometric experts and non-experts, Diff-FIT achieves comparable real-world utility to state-of-the-art systems in both subjective evaluations and identification rates with generated facial composites, while enabling greater variation and flexibility through description-based generation and diverse editing pipelines for adding distinct facial features. The source code for the Diff-FIT framework is publicly available at: https://github.com/matemato/Diff-FIT.},

keywords = {diffusion, face, face composite, face editing, face recognition},

pubstate = {published},

tppubtype = {article}

}

Close

Zhu, Haini; Jain, Deepak Kumar; Zhao, Xudong; Li, Muyu; Štruc, Vitomir; Tyagi, Sumarga Kumar Sah

StructFormer: Structure-Consistent Face De-Identification under Strong Privacy Constraints Proceedings Article

In: WACV-W 2026, pp. 1–11, 2026.

Abstract | Links | BibTeX | Tags: deep learning, deidentification, face analysis

Gan, Chenquan; Zhou, Daitao; Zhu, Qingyi; Wang, Xibin; Jain, Deepak Kumar; Štruc, Vitomir

Improving Emotion Recognition from Ambiguous Speech via Spatio-Temporal Spectrum Analysis and Real-Time Soft-Label Correction Journal Article

In: IEEE Transactions on Affective Computing, pp. 1-16, 2026.

Abstract | Links | BibTeX | Tags: deep learning, emotion recognition, speech, speech processing

@article{TAC_2026,

title = {Improving Emotion Recognition from Ambiguous Speech via Spatio-Temporal Spectrum Analysis and Real-Time Soft-Label Correction},

author = {Chenquan Gan and Daitao Zhou and Qingyi Zhu and Xibin Wang and Deepak Kumar Jain and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/12/Manuscript_clean.pdf},

year  = {2026},

date = {2026-03-01},

urldate = {2026-03-01},

journal = {IEEE Transactions on Affective Computing},

pages = {1-16},

abstract = {Speech represents a fundamental medium for conveying human emotions and, as a result, speech-based emotion recognition (SER) systems have become pivotal in advancing human-computer interaction (HCI) across a range of applications. While significant progress has been made in speech emotion recognition over recent years, existing solutions still face several key challenges, in that they: (i)  rely excessively  on subjectively annotated (discrete) labels during training, (ii)  often overlook the label ambiguity of speech samples that express more than one class of emotions, and (iii)  underutilize unlabeled or ambiguous speech, for which typically a label distribution (or so-called soft labels) is available. To address these issues, we propose in this paper a novel SER model that explicitly handles ambiguous  speech samples and overcomes the shortcomings outlined above. Central to our approach is a novel real-time soft-label correction strategy designed to refine the annotations assigned to ambiguous speech. The proposed model leverages both, (explicitly) labeled as well as ambiguous samples and applies the dynamic soft-label correction strategy alongside an enhanced inter-class difference loss function to iteratively optimize the label distributions during training. We theoretically demonstrate that our method is capable of approximating the true emotional distribution of speech even in the presence of label noise, suggesting that utilizing ambiguous speech samples without explicit emotion labels still contributes toward more effective emotion recognition. Furthermore, we integrate the representational power of convolutional neural networks (CNNs) with the contextual modeling capabilities of Wav2Vec 2.0 to enable a comprehensive extraction of spatio-temporal speech features. Experimental results on the IEMOCAP multi-label dataset confirm the effectiveness of our approach, achieving state-of-the-art performance with significant improvements in weighted accuracy (WA) and unweighted accuracy (UA) over competing methods.},

keywords = {deep learning, emotion recognition, speech, speech processing},

pubstate = {published},

tppubtype = {article}

}

Close

Speech represents a fundamental medium for conveying human emotions and, as a result, speech-based emotion recognition (SER) systems have become pivotal in advancing human-computer interaction (HCI) across a range of applications. While significant progress has been made in speech emotion recognition over recent years, existing solutions still face several key challenges, in that they: (i) rely excessively on subjectively annotated (discrete) labels during training, (ii) often overlook the label ambiguity of speech samples that express more than one class of emotions, and (iii) underutilize unlabeled or ambiguous speech, for which typically a label distribution (or so-called soft labels) is available. To address these issues, we propose in this paper a novel SER model that explicitly handles ambiguous speech samples and overcomes the shortcomings outlined above. Central to our approach is a novel real-time soft-label correction strategy designed to refine the annotations assigned to ambiguous speech. The proposed model leverages both, (explicitly) labeled as well as ambiguous samples and applies the dynamic soft-label correction strategy alongside an enhanced inter-class difference loss function to iteratively optimize the label distributions during training. We theoretically demonstrate that our method is capable of approximating the true emotional distribution of speech even in the presence of label noise, suggesting that utilizing ambiguous speech samples without explicit emotion labels still contributes toward more effective emotion recognition. Furthermore, we integrate the representational power of convolutional neural networks (CNNs) with the contextual modeling capabilities of Wav2Vec 2.0 to enable a comprehensive extraction of spatio-temporal speech features. Experimental results on the IEMOCAP multi-label dataset confirm the effectiveness of our approach, achieving state-of-the-art performance with significant improvements in weighted accuracy (WA) and unweighted accuracy (UA) over competing methods.

Close

Ivanovska, Marija; Todorov, Leon; Peer, Peter; Štruc, Vitomir

SelfMAD++: Self-Supervised Foundation Model with Local Feature Enhancement for Generalized Morphing Attack Detection Journal Article

In: Information Fusion, vol. 127, Part C, no. 103921, pp. 1-16, 2026.

Abstract | Links | BibTeX | Tags: anomaly detection, biometrics, CLIP, computer vision, face morphing detection, face recognition, foundation models

@article{InfoFUS_Marija,

title = {SelfMAD++: Self-Supervised Foundation Model with Local Feature Enhancement for Generalized Morphing Attack Detection},

author = {Marija Ivanovska and Leon Todorov and Peter Peer and Vitomir Štruc},

url = {https://www.sciencedirect.com/science/article/pii/S1566253525009832},

doi = {https://doi.org/10.1016/j.inffus.2025.103921},

year  = {2026},

date = {2026-03-01},

journal = {Information Fusion},

volume = {127, Part C},

number = {103921},

pages = {1-16},

abstract = {Face morphing attacks pose a growing threat to biometric systems, exacerbated by the rapid emergence of powerful generative techniques that enable realistic and seamless facial image manipulations. To address this challenge, we introduce SelfMAD++, a robust and generalized single-image morphing attack detection (S-MAD) framework. Unlike our previous work SelfMAD, which introduced a data augmentation technique to train off-the-shelf classifiers for attack detection, SelfMAD++ advances this paradigm by integrating the artifact-driven augmentation with foundation models and fine-grained spatial reasoning. At its core, SelfMAD++ builds on CLIP—a vision-language foundation model—adapted via Low-Rank Adaptation (LoRA) to align image representations with task-specific text prompts. To enhance sensitivity to spatially subtle and fine-grained artifacts, we integrate a parallel multi-scale convolutional branch specialized in dense, multi-scale feature extraction. This branch is guided by an auxiliary segmentation module, which acts as a regularizer by disentangling bona fide facial regions from potentially manipulated ones. The dual-branch features are adaptively fused through a gated attention mechanism, capturing both semantic context and fine-grained spatial cues indicative of morphing. SelfMAD++ is trained end-to-end using a multi-objective loss that balances semantic alignment, segmentation consistency, and classification accuracy. Extensive experiments across nine standard benchmark datasets demonstrate that SelfMAD++ achieves state-of-the-art performance, with an average Equal Error Rate (EER) of 3.91%, outperforming both supervised and unsupervised MAD methods by large margins. Notably, SelfMAD++ excels on modern, high-quality morphs generated by GAN and diffusion--based morphing methods, demonstrating its robustness and strong generalization capability. SelfMAD++ code and supplementary resources are publicly available at: https://github.com/LeonTodorov/SelfMADpp.},

keywords = {anomaly detection, biometrics, CLIP, computer vision, face morphing detection, face recognition, foundation models},

pubstate = {published},

tppubtype = {article}

}

Close

Face morphing attacks pose a growing threat to biometric systems, exacerbated by the rapid emergence of powerful generative techniques that enable realistic and seamless facial image manipulations. To address this challenge, we introduce SelfMAD++, a robust and generalized single-image morphing attack detection (S-MAD) framework. Unlike our previous work SelfMAD, which introduced a data augmentation technique to train off-the-shelf classifiers for attack detection, SelfMAD++ advances this paradigm by integrating the artifact-driven augmentation with foundation models and fine-grained spatial reasoning. At its core, SelfMAD++ builds on CLIP—a vision-language foundation model—adapted via Low-Rank Adaptation (LoRA) to align image representations with task-specific text prompts. To enhance sensitivity to spatially subtle and fine-grained artifacts, we integrate a parallel multi-scale convolutional branch specialized in dense, multi-scale feature extraction. This branch is guided by an auxiliary segmentation module, which acts as a regularizer by disentangling bona fide facial regions from potentially manipulated ones. The dual-branch features are adaptively fused through a gated attention mechanism, capturing both semantic context and fine-grained spatial cues indicative of morphing. SelfMAD++ is trained end-to-end using a multi-objective loss that balances semantic alignment, segmentation consistency, and classification accuracy. Extensive experiments across nine standard benchmark datasets demonstrate that SelfMAD++ achieves state-of-the-art performance, with an average Equal Error Rate (EER) of 3.91%, outperforming both supervised and unsupervised MAD methods by large margins. Notably, SelfMAD++ excels on modern, high-quality morphs generated by GAN and diffusion--based morphing methods, demonstrating its robustness and strong generalization capability. SelfMAD++ code and supplementary resources are publicly available at: https://github.com/LeonTodorov/SelfMADpp.

Close

Marić, Nikola; Ivanovska, Marija; Štruc, Vitomir

Exploring Multimodal Large Language Models for Morphing Attack Detection Proceedings Article

In: Proceedings of the 29th Computer Vision Winter Workshop, pp. 1-10, 2026.

Abstract | Links | BibTeX | Tags: biometrics, face analysis, face morphing, face morphing attack, face morphing detection, media forensics

Mishra, Gargi; Bajpai, Supriya; Saini, Dharmender; Jain, Rachna; Jain, Deepak Kumar; Štruc, Vitomir

EmoVisioNet: A hybrid network unifying lightweight CNN and attention-based vision model for facial emotion detection Journal Article

In: Neurocomputing, vol. 665, no. 132224, pp. 1-12, 2026.

Abstract | Links | BibTeX | Tags: CNN, deep learning, facial expression recognition, lightweight models

@article{EmoVison2025,

title = {EmoVisioNet: A hybrid network unifying lightweight CNN and attention-based vision model for facial emotion detection},

author = {Gargi Mishra and Supriya Bajpai and Dharmender Saini and Rachna Jain and Deepak Kumar Jain and Vitomir Štruc },

doi = {https://doi.org/10.1016/j.neucom.2025.132224},

year  = {2026},

date = {2026-02-07},

urldate = {2025-11-28},

journal = {Neurocomputing},

volume = {665},

number = {132224},

pages = {1-12},

abstract = {Facial emotion detection has witnessed a surge in demand across numerous applications, including human-computer interaction, healthcare, and security. Accurate expression recognition is crucial for improving human-computer interactions and understanding human behavior. Existing facial emotion detection models face challenges in achieving both high accuracy and real-time processing due to complex architectures. Our goal is to create an efficient yet accurate solution that can work on resource-constrained devices. To address the challenge of accurately recognizing emotions from facial expressions, we propose a novel hybrid approach that combines the strengths of pretrained Lightweight Convolutional Neural Networks (CNN), and Attention-based Vision Models. The pretrained Lightweight CNN serves as a feature extractor, efficiently capturing facial features, while the attention model refines the feature representation to focus on crucial regions of the face associated with different expressions. This enables our model to achieve state-of-the-art (SOTA) accuracy with reduced computational requirements. The proposed model, EmoVisioNet, achieves superior performance across multiple datasets, attaining 99.97 % accuracy on CK+, 96.23 % on RAF-DB, 93.88 % on FER2013, and 96.91 % on FERPlus. The obtained results surpass the current state-of-the-art in this field, demonstrating the EmoVisioNet’s superior performance in facial expression recognition.},

keywords = {CNN, deep learning, facial expression recognition, lightweight models},

pubstate = {published},

tppubtype = {article}

}

Close

Gan, Chenquan; Chen, Hongming; Qian, Yi; Tian, Liang; Zhu, Qingyi; Jain, Deepak Kumar; Štruc, Vitomir

Analyzing the influence of users, devices, and search engines on viral spread in the social internet of things Journal Article

In: Internet of Things, vol. 35, no. 101842, pp. 1–18, 2026.

Abstract | Links | BibTeX | Tags: internet of things, social internet of things, viral spread

@article{IOT2026,

title = {Analyzing the influence of users, devices, and search engines on viral spread in the social internet of things},

author = {Chenquan Gan and Hongming Chen and Yi Qian and Liang Tian and Qingyi Zhu and Deepak Kumar Jain and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/12/IOT_Paper.pdf},

doi = {https://doi.org/10.1016/j.iot.2025.101842},

year  = {2026},

date = {2026-01-15},

journal = {Internet of Things},

volume = {35},

number = {101842},

pages = {1--18},

abstract = {The Social Internet of Things (SIoT) seamlessly integrates the Internet of Things (IoT) with social networks, intensifying the interconnections among objects, humans, and their interactions. While SIoT facilitates rapid information access and sharing through search engines, it also increases the risk of computer virus propagation. It is, therefore, critical to understand how viruses propagate in SIoT networks and which factors contribute the most to viral spread. While such understanding is of paramount importance, comprehensive studies on this topic are still limited in the literature. To address this gap, we study in this paper the long-term behavior of viral spread in SIoT, examining the roles of users, devices, and search engines. Specifically, we propose a novel dynamical virus propagation model that accounts for key factors, such as user awareness, device security levels, search engines, and external storage media. In comparison to competing solutions, the proposed model offers a unique perspective on viral spread in SIoT by focusing on multiple influential factors, their interactions, while also considering the inherent characteristics of the SIoT framework. A comprehensive theoretical analysis of the model is conducted to identify patterns and the key aspects of virus propagation in SIoT. To further validate the findings, a virus propagation algorithm is also designed, and multiple simulations are conducted on two real network datasets (Facebook and P2P), demonstrating the validity of the theoretical findings.},

keywords = {internet of things, social internet of things, viral spread},

pubstate = {published},

tppubtype = {article}

}

Close

Rot, Peter; Jutreša, Robert; Peer, Peter; Štruc, Vitomir; Scheirer, Walter; Grm, Klemen

FaceMINT: A library for gaining insights into biometric face recognition via mechanistic interpretability Journal Article

In: Image and Vision Computing, no. 105804, pp. 1-23, 2025.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, face recognition, interpretability, MIXBAI, xai

@article{Rot_IVC2025,

title = {FaceMINT: A library for gaining insights into biometric face recognition via mechanistic interpretability},

author = {Peter Rot and Robert Jutreša and Peter Peer and Vitomir Štruc and Walter Scheirer and Klemen Grm},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/11/IVC_2025__FaceMINT.pdf

},

doi = {https://doi.org/10.1016/j.imavis.2025.105804},

year  = {2025},

date = {2025-11-10},

journal = {Image and Vision Computing},

number = {105804},

pages = {1-23},

abstract = {Deep-learning models, including those used in biometric recognition, have achieved remarkable performance on benchmark datasets as well as real-world recognition tasks. However, a major drawback of these models is their lack of transparency in decision-making. Mechanistic interpretability has emerged as a promising research field intended to help us gain insights into such models, but its application to biometric data remains limited. In this work, we bridge this gap by introducing the FaceMINT library, a publicly available Python library (build on top of Pytorch) that enables biometric researchers to inspect their models through mechanistic interpretability. It provides a plug-and-play solution that allows researchers to seamlessly switch between the analyzed biometric models, evaluate state-of-the-art sparse autoencoders, select from various image parametrizations, and fine-tune hyperparameters. Using a large scale Glint360K dataset, we demonstrate the usability of FaceMINT by applying its functionality to two state-of-the-art (deep-learning) face recognition models: AdaFace, based on Convolutional Neural Networks (CNN), and SwinFace, based on transformers. The proposed library implements various sparse auto-encoders (SAEs), including vanilla SAE, Gated SAE, JumpReLU SAE, and TopK SAE, which have achieved state-of-the-art results in the mechanistic interpretability of large language models. Our study highlights the promise of mechanistic interpretability in the biometric field, providing new avenues for researchers to explore model transparency and refine biometric recognition systems. The library is publicly available at www.gitlab.com/peterrot/facemint.},

keywords = {biometrics, CNN, deep learning, face recognition, interpretability, MIXBAI, xai},

pubstate = {published},

tppubtype = {article}

}

Close

Ivanovska, Marija; Kreft, Jakob; Štruc, Vitomir; Perš, Janez

Privacy-by-design AIoT Vision for Intelligent Urban Environments Journal Article

In: Journal of Systems Architecture, 2025.

Abstract | Links | BibTeX | Tags: ai, AIoT, computer vision, intelligent environment, privacy

Gan, Chenquan; Zhou, Daitao; Wang, Kexin; Zhu, Qingyi; Jain, Deepak Kumar; Štruc, Vitomir

Optimizing ambiguous speech emotion recognition through spatial–temporal parallel network with label correction strategy Journal Article

In: Computer Vision and Image Understanding, vol. 260, no. 104483, pp. 1–14, 2025.

Abstract | Links | BibTeX | Tags: deep learning, emotion recognition, speech, speech processing, speech technologies

@article{CVIU_2025b,

title = {Optimizing ambiguous speech emotion recognition through spatial–temporal parallel network with label correction strategy},

author = {Chenquan Gan and Daitao Zhou and Kexin Wang and Qingyi Zhu and Deepak Kumar Jain and Vitomir Štruc},

url = {https://www.sciencedirect.com/science/article/pii/S1077314225002061?dgcid=coauthor

https://lmi.fe.uni-lj.si/wp-content/uploads/2025/09/CVIU.pdf},

doi = {https://doi.org/10.1016/j.cviu.2025.104483},

year  = {2025},

date = {2025-10-01},

urldate = {2025-10-01},

journal = {Computer Vision and Image Understanding},

volume = {260},

number = { 104483},

pages = {1--14},

abstract = {Speech emotion recognition is of great significance for improving the human–computer interaction experience. However, traditional methods based on hard labels have difficulty dealing with the ambiguity of emotional expression. Existing studies alleviate this problem by redefining labels, but still rely on the subjective emotional expression of annotators and fail to consider the truly ambiguous speech samples without dominant labels fully. To solve the problems of insufficient expression of emotional labels and ignoring ambiguous undominantly labeled speech samples, we propose a label correction strategy that uses a model with exact sample knowledge to modify inappropriate labels for ambiguous speech samples, integrating model training with emotion cognition, and considering the ambiguity without dominant label samples. It is implemented on a spatial–temporal parallel network, which adopts a temporal pyramid pooling (TPP) to process the variable-length features of speech to improve the recognition efficiency of speech emotion. Through experiments, it has been shown that ambiguous speech after label correction has a more promoting effect on the recognition performance of speech emotions.},

keywords = {deep learning, emotion recognition, speech, speech processing, speech technologies},

pubstate = {published},

tppubtype = {article}

}

Close

Babnik, Žiga; Štruc, Vitomir

Delno nadzorovano ocenjevanje kakovosti obraznih slik Proceedings Article

In: Proceedings of ERK 2025, 2025.

Abstract | Links | BibTeX | Tags: face analysis, face image quality assessment, face images, face recognition

@inproceedings{Babnik_ERK25,

title = {Delno nadzorovano ocenjevanje kakovosti obraznih slik},

author = {Žiga Babnik and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/11/ERK25.pdf},

year  = {2025},

date = {2025-09-25},

booktitle = {Proceedings of ERK 2025},

abstract = {Important security and surveillance applications often depend on reliable predictions from the underlying face recognition (FR) models. Due to the nature of such applications FR models have to perform well in various unconstrained conditions. While state-of-the-art FR models achieve excellent results on large and varied closed set benchmarks, their performance depends heavily on the quality of the input face samples. Low-quality samples can cause critical false-match errors, lowering the trustworthiness of FR models, and furthermore lead to

monetary or privacy issues. Face Image Quality Assessment (FIQA) techniques offer the FR model an estimate of the sample’s quality, allowing the system to reject samples of poor quality. Supervised state-of-the-art FIQA techniques rely on extensive training to accurately assess the sample quality. Alternatively, unsupervised techniques extract the quality directly from the input sample, achieving higher runtime complexity and worse performance. In this paper, we present a technique for quality

estimation, combining desired characteristics of both supervised and unsupervised methods. Our technique is able to quickly estimate the quality using a single forward pass of the sample through the model needed also for recognition, without any prior supervised training. Comprehensive experiments on a varied set of benchmark datasets and face recognition models show that our method outperforms all existing unsupervised techniques and performs similarly to current state-of-the-art supervised techniques, while achieving excellent runtime.},

keywords = {face analysis, face image quality assessment, face images, face recognition},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Ožbot, Miha; Škrjanc, Igor; Štruc, Vitomir

A Neuro-Fuzzy System for Interpretable Long-Term Stock Market Forecasting Proceedings Article

In: Proceedings of ERK 2025, pp. 213-216, 2025.

Abstract | Links | BibTeX | Tags: deep learning, forecasting, Fuzzformer, LSTM, stock market forecasting

Vitek, Matej; Tomašević, Darian; Das, Abhijit; Nathan, Sabari; Özbulak, Gökhan; Özbulak, Tataroğlu; Ayşe, Gözde; Calbimonte, Jean-Paul; Anjos, André; Bhatt, Hariohm Hemant; Premani, Dhruv Dhirendra; Chaudhari, Jay; Wang, Caiyong; Jiang, Jian; Zhang, Chi; Zhang, Qi; Ganapathi, Iyyakutti Iyappan; Ali, Syed Sadaf; Velayudan, Divya; Assefa, Maregu; Werghi, Naoufel; Daniels, Zachary A.; John, Leeon; Vyas, Ritesh; Khiarak, Jalil Nourmohammadi; Saeed, Taher Akbari; Nasehi, Mahsa; Kianfar, Ali; Pashazadeh Panahi, Mobina; Sharma, Geetanjali; Panth, Pushp Raj; Ramachandra, Raghavendra; Nigam, Aditya; Pal, Umapada; Peer, Peter; Štruc, Vitomir

Privacy-enhancing Sclera Segmentation Benchmarking Competition: SSBC 2025 Proceedings Article

In: Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2025), pp. 1–13, IEEE, 2025.

Links | BibTeX | Tags: biometrics, deep learning, sclera segmentation, segmentation, SSBC

Babnik, Žiga; Jain, Deepak Kumar; Peer, Peter; Štruc, Vitomir

FROQ: Observing Face Recognition Models for Efficient Quality Assessment Proceedings Article

In: Proceedings of the IEEE International Joint Conference On Biometrics (IJCB), pp. 1–10, IEEE 2025.

Abstract | Links | BibTeX | Tags: biometrics, face image quality assessment, face recognition, FIQA

@inproceedings{BabnikIJCB25,

title = {FROQ: Observing Face Recognition Models for Efficient Quality Assessment},

author = {Žiga Babnik and Deepak Kumar Jain and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/08/IJCB_25.pdf

https://arxiv.org/pdf/2509.17689?},

year  = {2025},

date = {2025-09-08},

urldate = {2025-09-08},

booktitle = {Proceedings of the IEEE International Joint Conference On Biometrics (IJCB)},

pages = {1--10},

organization = {IEEE},

abstract = {Face Recognition (FR) plays a crucial role in many critical (high-stakes) applications, where errors in the recognition process can lead to serious consequences. Face Image Quality Assessment (FIQA) techniques enhance FR systems by providing quality estimates of face samples, enabling the systems to discard samples that are unsuitable for reliable recognition or lead to low-confidence recognition decisions. Most state-of-the-art FIQA techniques rely on extensive supervised training to achieve accurate quality estimation. In contrast, unsupervised techniques eliminate the need for additional training but tend to be slower and typically exhibit lower performance. In this paper, we introduce FROQ (Face Recognition Observer of Quality), a semi-supervised, training-free approach that leverages specific intermediate representations within a given FR model to estimate face-image quality, and combines the efficiency of supervised FIQA models with the training-free approach of unsupervised methods. A simple calibration step based on pseudo-quality labels allows FROQ to uncover specific representations, useful for quality assessment, in any modern FR model. To generate these pseudo-labels, we propose a novel unsupervised FIQA technique based on sample perturbations. Comprehensive experiments with four state-of-the-art FR models and eight benchmark datasets show that FROQ leads to highly competitive results compared to the state-of-the-art, achieving both strong performance and efficient runtime, without requiring explicit training. The code for FROQ is available from: https://github.com/LSIbabnikz/FROQ},

keywords = {biometrics, face image quality assessment, face recognition, FIQA},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

TOMAŠEVIĆ, Darian; ŠPACAPAN, Blaž; PERUŠIĆ, Ani; PINČIĆ, Domagoj; MEDEN, Blaž; FREIRE-OBREGÓN, David; EMERŠIČ, Žiga; PEER, Vitomir ŠTRUC Peter; SUŠANJ, Diego

SynPalms: Palm Identification with Synthetic Data Proceedings Article

In: 16th International Conference on Ubiquitous and Future Networks (ICUFN 2025), pp. 359-361, 2025.

Abstract | Links | BibTeX | Tags: biometrics, data synthesis, generative AI, palmprint recognition

Tomašević, Darian; Boutros, Fadi; Lin, Chenhao; Damer, Naser; Štruc, Vitomir; Peer, Peter

ID-Booth: Identity-consistent Face Generation with Diffusion Models Proceedings Article

In: IEEE International Conference on Automatic Face and Gesture Recognition 2025, pp. 1-10, 2025.

Abstract | Links | BibTeX | Tags: data synthesis, difussion, face, face images, face recognition, generative AI, generative models, synthetic data

@inproceedings{DarianFG2025,

title = {ID-Booth: Identity-consistent Face Generation with Diffusion Models},

author = {Darian Tomašević and Fadi Boutros and Chenhao Lin and Naser Damer and Vitomir Štruc and Peter Peer},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/04/ID_Booth__2025_.pdf},

year  = {2025},

date = {2025-05-27},

booktitle = {IEEE International Conference on Automatic Face and Gesture Recognition 2025},

pages = {1-10},

abstract = {Recent advances in generative modeling have enabled the generation of high-quality synthetic data that is applicable in a variety of domains, including face recognition.  

Here, state-of-the-art generative models typically rely on conditioning and fine-tuning of powerful pretrained diffusion models to facilitate the synthesis of realistic images of a desired identity. Yet, these models often do not consider the identity of subjects during training, leading to poor consistency between generated and intended identities. In contrast, methods that employ identity-based training objectives tend to overfit on various aspects of the identity, and in turn, lower the diversity of images that can be generated. To address these issues, we present in this paper a novel generative diffusion-based framework, called ID-Booth. ID-Booth consists of a denoising network responsible for data generation, a variational auto-encoder for mapping images to and from a lower-dimensional latent space and a text encoder that allows for prompt-based control over the generation procedure. The framework utilizes a novel triplet identity training objective and enables identity-consistent image generation while retaining the synthesis capabilities of pretrained diffusion models. Experiments with a state-of-the-art latent diffusion model and diverse prompts reveal that our method facilitates better intra-identity consistency and inter-identity separability than competing methods, while achieving higher image diversity. In turn, the produced data allows for effective augmentation of small-scale datasets and training of better-performing recognition models in a privacy-preserving manner. The source code for the ID-Booth framework is publicly available at https://github.com/dariant/ID-Booth. },

keywords = {data synthesis, difussion, face, face images, face recognition, generative AI, generative models, synthetic data},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Ivanovska, Marija; Todorov, Leon; Damer, Naser; Jain, Deepak Kumar; Peer, Peter; Štruc, Vitomir

SelfMAD: Enhancing Generalization and Robustness in Morphing Attack Detection via Self-Supervised Learning Proceedings Article

In: IEEE International Conference on Automatic Face and Gesture Recognition 2025, pp. 1-10, 2025.

Abstract | Links | BibTeX | Tags: biometrics, face, face morphing, face morphing attack, face morphing detection, self-supervised learning, selfMAD

@inproceedings{MarijaFG2025,

title = {SelfMAD: Enhancing Generalization and Robustness in Morphing Attack Detection via Self-Supervised Learning},

author = {Marija Ivanovska and Leon Todorov and Naser Damer and Deepak Kumar Jain and Peter Peer and

Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/04/FG2025__SelfMAD.pdf

http://arxiv.org/abs/2504.05504},

year  = {2025},

date = {2025-05-27},

booktitle = {IEEE International Conference on Automatic Face and Gesture Recognition 2025},

pages = {1-10},

abstract = {With the continuous advancement of generative models, face morphing attacks have become a significant challenge for existing face verification systems due to their potential use in identity fraud and other malicious activities. Contemporary Morphing Attack Detection (MAD) approaches frequently rely on supervised, discriminative models trained on examples of bona fide and morphed images. These models typically perform well with morphs generated with techniques seen during training, but often lead to suboptimal performance when subjected to novel unseen morphing techniques. While unsupervised models have been shown to perform better in terms of generalizability, they typically result in higher error rates, as they struggle to effectively capture features of subtle artifacts. To address these shortcomings, we present SelfMAD, a novel self-supervised approach that simulates general morphing attack artifacts, allowing classifiers to learn generic and robust decision boundaries without overfitting to the specific artifacts induced by particular face morphing methods. Through extensive experiments on widely used datasets, we demonstrate that SelfMAD significantly outperforms current state-of-the-art MADs, reducing the detection error by more than 64% in terms of EER when compared to the strongest unsupervised competitor, and by more than 66%, when compared to the best performing discriminative MAD model, tested in cross-morph settings. The source code for SelfMAD is available at https://github.com/LeonTodorov/SelfMAD.},

keywords = {biometrics, face, face morphing, face morphing attack, face morphing detection, self-supervised learning, selfMAD},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Oblak, Tim; Videnović, Jovana; Kupinić, Haris; Štruc, Vitomir; Peer, Peter; Emeršič, Žiga

Fingerprint image scale estimation for forensic identification systems Journal Article

In: International Journal of Computers Communications & Control, vol. 20, iss. 2, pp. 1–14, 2025.

Abstract | Links | BibTeX | Tags: biometrics, finger marks, fingerprint recognition, fingerprints, latent fingerprints

DeAndres-Tame, Ivan; Tolosana, Ruben; Melzi, Pietro; Vera-Rodriguez, Ruben; Kim, Minchul; Rathgeb, Christian; Liu, Xiaoming; Gomez, Luis F.; Morales, Aythami; Fierrez, Julian; Ortega-Garcia, Javier; Zhong, Zhizhou; Huang, Yuge; Mi, Yuxi; Ding, Shouhong; Zhou, Shuigeng; He, Shuai; Fu, Lingzhi; Cong, Heng; Zhang, Rongyu; Xiao, Zhihong; Smirnov, Evgeny; Pimenov, Anton; Grigorev, Aleksei; Timoshenko, Denis; Asfaw, Kaleb Mesfin; Low, Cheng Yaw; Liu, Hao; Wang, Chuyi; Zuo, Qing; He, Zhixiang; Shahreza, Hatef Otroshi; George, Anjith; Unnervik, Alexander; Rahimi, Parsa; Marcel, Sebastien; Neto, Pedro C.; Huber, Marco; Kolf, Jan Niklas; Damer, Naser; Boutros, Fadi; Cardoso, Jaime S.; Sequeira, Ana F.; Atzori, Andrea; Fenu, Gianni; Marras, Mirko; Štruc, Vitomir; Yu, Jiang; Li, Zhangjie; Li, Jichun; Zhao, Weisong; Lei, Zhen; Zhu, Xiangyu; Zhang, Xiao-Yu; Biesseck, Bernardo; Vidal, Pedro; Coelho, Luiz; Granada, Roger; Menotti, David

Second FRCSyn-onGoing: Winning solutions and post-challenge analysis to improve face recognition with synthetic data Journal Article

In: Information Fusion, no. 103099, 2025.

Abstract | Links | BibTeX | Tags: biometrics, data synthesis, face, face recognition, face synthesis, synthetic data

@article{Synth_InfoFUS2025,

title = {Second FRCSyn-onGoing: Winning solutions and post-challenge analysis to improve face recognition with synthetic data},

author = {Ivan DeAndres-Tame and Ruben Tolosana and Pietro Melzi and Ruben Vera-Rodriguez and Minchul Kim and Christian Rathgeb and Xiaoming Liu and Luis F. Gomez and Aythami Morales and Julian Fierrez and Javier Ortega-Garcia and Zhizhou Zhong and Yuge Huang and Yuxi Mi and Shouhong Ding and Shuigeng Zhou and Shuai He and Lingzhi Fu and Heng Cong and Rongyu Zhang and Zhihong Xiao and Evgeny Smirnov and Anton Pimenov and Aleksei Grigorev and Denis Timoshenko and Kaleb Mesfin Asfaw and Cheng Yaw Low and Hao Liu and Chuyi Wang and Qing Zuo and Zhixiang He and Hatef Otroshi Shahreza and Anjith George and Alexander Unnervik and Parsa Rahimi and Sebastien Marcel and Pedro C. Neto and Marco Huber and Jan Niklas Kolf and Naser Damer and Fadi Boutros and Jaime S. Cardoso and Ana F. Sequeira and Andrea Atzori and Gianni Fenu and Mirko Marras and Vitomir Štruc and Jiang Yu and Zhangjie Li and Jichun Li and Weisong Zhao and Zhen Lei and Xiangyu Zhu and Xiao-Yu Zhang and Bernardo Biesseck and Pedro Vidal and Luiz Coelho and Roger Granada and David Menotti},

url = {https://www.sciencedirect.com/science/article/pii/S1566253525001721},

doi = {https://doi.org/10.1016/j.inffus.2025.103099},

year  = {2025},

date = {2025-03-14},

urldate = {2025-03-14},

journal = {Information Fusion},

number = {103099},

abstract = {Synthetic data is gaining increasing popularity for face recognition technologies, mainly due to the privacy concerns and challenges associated with obtaining real data, including diverse scenarios, quality, and demographic groups, among others. It also offers some advantages over real data, such as the large amount of data that can be generated or the ability to customize it to adapt to specific problem-solving needs. To effectively use such data, face recognition models should also be specifically designed to exploit synthetic data to its fullest potential. In order to promote the proposal of novel Generative AI methods and synthetic data, and investigate the application of synthetic data to better train face recognition systems, we introduce the 2nd FRCSyn-on-Going challenge, based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. This is an ongoing challenge that provides researchers with an accessible platform to benchmark (i) the proposal of novel Generative AI methods and synthetic data, and (ii) novel face recognition systems that are specifically proposed to take advantage of synthetic data. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition such as demographic bias, domain adaptation, and performance constraints in demanding situations, such as age disparities between training and testing, changes in the pose, or occlusions. Very interesting findings are obtained in this second edition, including a direct comparison with the first one, in which synthetic databases were restricted to DCFace and GANDiffFace.},

keywords = {biometrics, data synthesis, face, face recognition, face synthesis, synthetic data},

pubstate = {published},

tppubtype = {article}

}

Close

Batagelj, Borut; Kronovšek, Andrej; Štruc, Vitomir; Peer, Peter

Robust cross-dataset deepfake detection with multitask self-supervised learning Journal Article

In: ICT Express, pp. 1-5, 2025.

Abstract | Links | BibTeX | Tags: deepfake, deepfake DAD, deepfake detection, multi-task learning, segmentation

Caldeira, Eduarda; Ozgur, Guray; Chettaoui, Tahar; Ivanovska, Marija; Peer, Peter; Boutros, Fadi; Struc, Vitomir; Damer, Naser

MADation: Face Morphing Attack Detection with Foundation Models Proceedings Article

In: Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025, pp. 1-11, Tucson, USA, 2025.

Abstract | Links | BibTeX | Tags: face morphing, face morphing attack, face morphing detection, foundation models, morphing attack, morphing attack detection

@inproceedings{FadiWACV2025_Foundation,

title = {MADation: Face Morphing Attack Detection with Foundation Models},

author = {Eduarda Caldeira and Guray Ozgur and Tahar Chettaoui and Marija Ivanovska and Peter Peer and Fadi Boutros and Vitomir Struc and Naser Damer},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/01/MADation__Face_Morphing_Attack_Detection_with_Foundation_Models.pdf},

year  = {2025},

date = {2025-03-01},

booktitle = {Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025},

pages = {1-11},

address = {Tucson, USA},

abstract = {Despite the considerable performance improvements of face recognition algorithms in recent years, the same scientific advances responsible for this progress can also be used to create efficient ways to attack them, posing a threat to their secure deployment. Morphing attack detection (MAD) systems aim to detect a specific type of threat, morphing attacks, at an early stage, preventing them from being considered for verification in critical processes. Foundation models (FM) learn from extensive amounts of unlabelled data, achieving remarkable zero-shot generalization to unseen domains. Although this generalization capacity might be weak when dealing with domain-specific downstream tasks such as MAD, FMs can easily adapt to these settings while retaining the built-in knowledge acquired during pre-training. In this work, we recognize the potential of FMs to perform well in the MAD task when properly adapted to its specificities. To this end, we adapt FM CLIP architectures with LoRA weights while simultaneously training a classification header. The proposed framework, MADation surpasses our alternative FM and transformer-based frameworks and constitutes the first adaption of FMs to the MAD task. MADation presents competitive results with current MAD solutions in the literature and even surpasses them in several evaluation scenarios. To encourage reproducibility and facilitate further research in MAD, we publicly release the implementation of MADation at https://github.com/gurayozgur/MADation.},

keywords = {face morphing, face morphing attack, face morphing detection, foundation models, morphing attack, morphing attack detection},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Soltandoost, Elahe; Plesh, Richard; Schuckers, Stephanie; Peer, Peter; Struc, Vitomir

Extracting Local Information from Global Representations for Interpretable Deepfake Detection Proceedings Article

In: Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025, pp. 1-11, Tucson, USA, 2025.

Abstract | Links | BibTeX | Tags: CNN, deepfake DAD, deepfakes, faceforensics++, media forensics, xai

@inproceedings{Elahe_WACV2025,

title = {Extracting Local Information from Global Representations for Interpretable Deepfake Detection},

author = {Elahe Soltandoost and Richard Plesh and Stephanie Schuckers and Peter Peer and Vitomir Struc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/01/ElahePaperF.pdf},

year  = {2025},

date = {2025-03-01},

booktitle = {Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025},

pages = {1-11},

address = {Tucson, USA},

abstract = {The detection of deepfakes has become increasingly challenging due to the sophistication of manipulation techniques that produce highly convincing fake videos. Traditional detection methods often lack transparency and provide limited insight into their decision-making processes. To address these challenges, we propose in this paper a Locally-Explainable Self-Blended (LESB) DeepFake detector that in addition to the final fake-vs-real classification decision also provides information, on which local facial region (i.e., eyes, mouth or nose) contributed the most to the decision process.~At the heart of the detector is a novel Local Feature Discovery (LFD) technique that can be applied to the embedding space of pretrained DeepFake detectors and allows identifying embedding space directions that encode variations in the appearance of local facial features. We demonstrate the merits of the proposed LFD technique and LESB detector in comprehensive experiments on four popular datasets, i.e.,  Celeb-DF, DeepFake Detection Challenge, Face Forensics in the Wild and FaceForensics++, and show that the proposed detector is not only competitive in comparison to strong baselines, but also exhibits enhanced transparency in the decision-making process by providing insights on the contribution of local face parts in the final detection decision. },

keywords = {CNN, deepfake DAD, deepfakes, faceforensics++, media forensics, xai},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Manojlovska, Anastasija; Ramachandra, Raghavendra; Spathoulas, Georgios; Struc, Vitomir; Grm, Klemen

Interpreting Face Recognition Templates using Natural Language Descriptions Proceedings Article

In: Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025, pp. 1-10, Tucson, USA, 2025.

Abstract | Links | BibTeX | Tags: CLIP, explainability, face recognition, natural language, symbolic representations, xai

@inproceedings{Anastasija_WACV25,

title = {Interpreting Face Recognition Templates using Natural Language Descriptions},

author = {Anastasija Manojlovska and Raghavendra Ramachandra and Georgios Spathoulas and Vitomir Struc and Klemen Grm},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/01/WACV_2025_RWS_workshop_clanek.pdf},

year  = {2025},

date = {2025-03-01},

urldate = {2025-03-01},

booktitle = {Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025},

pages = {1-10},

address = {Tucson, USA},

abstract = {Explainable artificial intelligence (XAI) aims to ensure an AI system's decisions are transparent and understandable by humans, which is particularly important in potentially sensitive application scenarios in surveillance, security and law enforcement. In these and related areas, understanding  the internal mechanisms governing the decision-making process of AI-based systems can increase trust and consequently user acceptance. While various methods have been developed to provide insights into the behavior of AI-based models, solutions capable of explaining different aspects of the models using Natural Language are still limited in the literature. In this paper, we therefore propose a novel approach for interpreting the information content encoded in face templates, produced by state-of-the-art (SOTA) face recognition models. Specifically, we utilize the Text Encoder from the Contrastive Language-Image Pretraining (CLIP) model and generate natural language descriptions of various face attributes present in the face templates. We implement two versions of our approach, with the off-the-shelf CLIP text-encoder and a fine-tuned version using the VGGFace2 and MAADFace datasets.~Our experimental results indicate that the fine-tuned text encoder under the contrastive training paradigm increases the attribute-based explainability of face recognition templates, while both models provide valuable human-understandable insights into modern face recognition models.},

keywords = {CLIP, explainability, face recognition, natural language, symbolic representations, xai},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Pernus, Martin; Fookes, Clinton; Struc, Vitomir; Dobrisek, Simon

FICE: Text-conditioned fashion-image editing with guided GAN inversion Journal Article

In: Pattern Recognition, vol. 158, no. 111022, pp. 1-18, 2025.

Abstract | Links | BibTeX | Tags: computer vision for fashion, GAN inversion, generative adversarial networks, generative AI, image editing, text conditioning

@article{PR_FICE_2024,

title = {FICE: Text-conditioned fashion-image editing with guided GAN inversion},

author = {Martin Pernus and Clinton Fookes and Vitomir Struc and Simon Dobrisek},

url = {https://www.sciencedirect.com/science/article/pii/S0031320324007738

https://lmi.fe.uni-lj.si/wp-content/uploads/2024/09/FICE_main_paper.pdf

https://lmi.fe.uni-lj.si/wp-content/uploads/2024/09/FICE_supplementary.pdf},

doi = {https://doi.org/10.1016/j.patcog.2024.111022},

year  = {2025},

date = {2025-02-01},

urldate = {2025-02-01},

journal = {Pattern Recognition},

volume = {158},

number = {111022},

pages = {1-18},

abstract = {Fashion-image editing is a challenging computer-vision task where the goal is to incorporate selected apparel into a given input image. Most existing techniques, known as Virtual Try-On methods, deal with this task by first selecting an example image of the desired apparel and then transferring the clothing onto the target person. Conversely, in this paper, we consider editing fashion images with text descriptions. Such an approach has several advantages over example-based virtual try-on techniques: (i) it does not require an image of the target fashion item, and (ii) it allows the expression of a wide variety of visual concepts through the use of natural language. Existing image-editing methods that work with language inputs are heavily constrained by their requirement for training sets with rich attribute annotations or they are only able to handle simple text descriptions. We address these constraints by proposing a novel text-conditioned editing model called FICE (Fashion Image CLIP Editing) that is capable of handling a wide variety of diverse text descriptions to guide the editing procedure. Specifically, with FICE, we extend the common GAN-inversion process by including semantic, pose-related, and image-level constraints when generating images. We leverage the capabilities of the CLIP model to enforce the text-provided semantics, due to its impressive image–text association capabilities. We furthermore propose a latent-code regularization technique that provides the means to better control the fidelity of the synthesized images. We validate the FICE through rigorous experiments on a combination of VITON images and Fashion-Gen text descriptions and in comparison with several state-of-the-art, text-conditioned, image-editing approaches. Experimental results demonstrate that the FICE generates very realistic fashion images and leads to better editing than existing, competing approaches. The source code is publicly available from: 

https://github.com/MartinPernus/FICE},

keywords = {computer vision for fashion, GAN inversion, generative adversarial networks, generative AI, image editing, text conditioning},

pubstate = {published},

tppubtype = {article}

}

Close

Fashion-image editing is a challenging computer-vision task where the goal is to incorporate selected apparel into a given input image. Most existing techniques, known as Virtual Try-On methods, deal with this task by first selecting an example image of the desired apparel and then transferring the clothing onto the target person. Conversely, in this paper, we consider editing fashion images with text descriptions. Such an approach has several advantages over example-based virtual try-on techniques: (i) it does not require an image of the target fashion item, and (ii) it allows the expression of a wide variety of visual concepts through the use of natural language. Existing image-editing methods that work with language inputs are heavily constrained by their requirement for training sets with rich attribute annotations or they are only able to handle simple text descriptions. We address these constraints by proposing a novel text-conditioned editing model called FICE (Fashion Image CLIP Editing) that is capable of handling a wide variety of diverse text descriptions to guide the editing procedure. Specifically, with FICE, we extend the common GAN-inversion process by including semantic, pose-related, and image-level constraints when generating images. We leverage the capabilities of the CLIP model to enforce the text-provided semantics, due to its impressive image–text association capabilities. We furthermore propose a latent-code regularization technique that provides the means to better control the fidelity of the synthesized images. We validate the FICE through rigorous experiments on a combination of VITON images and Fashion-Gen text descriptions and in comparison with several state-of-the-art, text-conditioned, image-editing approaches. Experimental results demonstrate that the FICE generates very realistic fashion images and leads to better editing than existing, competing approaches. The source code is publicly available from:
https://github.com/MartinPernus/FICE

Close

Gan, Chenquan; Yang, Wei; Zhu, Qingyi; Li, Meng; Jain, Deepak Kumar; Struc, Vitomir; Huang, Da-Wen

Hybrid Rumor Debunking in Online Social Networks: A Differential Game Approach Journal Article

In: IEEE Transactions on Systems, Man and Cybernetics: Systems, 2025.

Abstract | Links | BibTeX | Tags: differential game theory, nash equilibirum, rumor propagation, social networks

Grm, Klemen; Ozata, Berk Kemal; Kantarci, Alperen; Struc, Vitomir; Ekenel, Hazim Kemal

Degrade or super-resolve to recognize? Bridging the Domain Gap for Cross-Resolution Face Recognition Journal Article

In: IEEE Access, pp. 1-16, 2025, ISSN: 2169-3536.

Abstract | Links | BibTeX | Tags: CNN, face recognition, low quality, super-resolution

@article{GrmAccess2025,

title = {Degrade or super-resolve to recognize? Bridging the Domain Gap for Cross-Resolution Face Recognition},

author = {Klemen Grm and Berk Kemal Ozata and Alperen Kantarci and Vitomir Struc and Hazim Kemal Ekenel},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10833634

https://lmi.fe.uni-lj.si/wp-content/uploads/2025/01/Degrade_or_super-resolve_to_recognize_Bridging_the_Domain_Gap_for_Cross-Resolution_Face_Recognition_compressed-1.pdf},

doi = {10.1109/ACCESS.2025.3527236},

issn = {2169-3536},

year  = {2025},

date = {2025-01-08},

journal = {IEEE Access},

pages = {1-16},

abstract = {In this work, we address the problem of cross-resolution face recognition, where a low-resolution probe face is compared against high-resolution gallery faces. To address this challenging problem, we investigate two approaches for bridging the quality gap between low-quality probe faces and high-quality gallery faces. The first approach focuses on degrading the quality of high-resolution gallery images to bring them closer to the quality of the probe images. The second approach involves enhancing the resolution of the probe images using face hallucination. Our experiments on the SCFace and DroneSURF datasets reveal that the success of face hallucination is highly dependent on the quality of the original images, since poor image quality can severely limit the effectiveness of the hallucination technique. Therefore, the selection of the appropriate face recognition method should consider the quality of the images. Additionally, our experiments also suggest that combining gallery degradation and face hallucination in a hybrid recognition scheme provides the best overall results for cross-resolution face recognition with relatively high-quality probe images, while the degradation process on its own is the more suitable option for low-quality probe images. Our results show that the combination of standard computer vision approaches such as degradation, super-resolution, feature fusion, and score fusion can be used to substantially improve performance on the task of low resolution face recognition using off-the-shelf face recognition models without re-training on the target domain.},

keywords = {CNN, face recognition, low quality, super-resolution},

pubstate = {published},

tppubtype = {article}

}

Close

Vitek, Matej; Štruc, Vitomir; Peer, Peter

GazeNet: A lightweight multitask sclera feature extractor Journal Article

In: Alexandria Engineering Journal, vol. 112, pp. 661-671, 2025.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, lightweight models, sclera

Gan, Chenquan; Xiao, Junhao; Zhu, Qingyi; Jain, Deepak Kumar; Struc, Vitomir

Transfer-Learning Enabled Micro-Expression Recognition Using Dense Connections and Mixed Attention Journal Article

In: Knowledge Based Systems, vol. 305, iss. December 2024, no. 112640, 2024.

Abstract | Links | BibTeX | Tags:

@article{KBS_2024,

title = {Transfer-Learning Enabled Micro-Expression Recognition Using Dense Connections and Mixed Attention},

author = {Chenquan Gan and Junhao Xiao and Qingyi Zhu and Deepak Kumar Jain and Vitomir Struc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/11/Transfer_Learning_Enabled_compressed.pdf},

doi = {https://doi.org/10.1016/j.knosys.2024.112640},

year  = {2024},

date = {2024-12-01},

urldate = {2025-02-01},

journal = {Knowledge Based Systems},

volume = {305},

number = {112640},

issue = {December 2024},

abstract = {Micro-expression recognition (MER) is a challenging computer vision problem, where the limited amount of available training data and insufficient intensity of the facial expressions are among the main issues adversely affecting the performance of existing recognition models. To address these challenges, this paper explores a transfer–learning enabled MER model using a densely connected feature extraction module with mixed attention. Unlike previous works that utilize transfer learning to facilitate MER and extract local facial expression information, our model relies on pretraining with three diverse macro-expression datasets and, as a result, can: (i) overcome the problem of insufficient sample size and limited training data availability, (ii) leverage (related) domain-specific information from multiple datasets with diverse characteristics, and (iii) improve the model adaptability to complex scenes. Furthermore, to enhance the intensity of the micro expressions and improve the discriminability of the extracted features, the Euler video magnification (EVM) method is adopted in the preprocessing stage and then used jointly with a densely connected feature extraction module and a mixed attention mechanism to derive expressive feature representations for the classification procedure. The proposed feature extraction mechanism not only guarantees the integrity of the extracted features but also efficiently captures local texture cues by aggregating the most salient information from the generated feature maps, which is key for the MER task. The experimental results on multiple datasets demonstrate the robustness and effectiveness of our model compared to the state-of-the-art.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Dragar, Luka; Rot, Peter; Peer, Peter; Štruc, Vitomir; Batagelj, Borut

W-TDL: Window-Based Temporal Deepfake Localization Proceedings Article

In: Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing (MRAC ’24), Proceedings of the 32nd ACM International Conference on Multimedia (MM’24), ACM, 2024.

Abstract | Links | BibTeX | Tags: CNN, deepfake DAD, deepfakes, deeplearning, detection, localization

Boutros, Fadi; Štruc, Vitomir; Damer, Naser

AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition Proceedings Article

In: Proceedings of the European Conference on Computer Vision (ECCV 2024), pp. 1-20, 2024.

Abstract | Links | BibTeX | Tags: adaptive distillation, biometrics, CNN, deep learning, face, face recognition, knowledge distillation

Ocvirk, Krištof; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut

Primerjava metod za zaznavanje napadov ponovnega zajema Proceedings Article

In: Proceedings of ERK, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: attacks, biometrics, CNN, deep learning, identity cards, pad

Manojlovska, Anastasija; Štruc, Vitomir; Grm, Klemen

Interpretacija mehanizmov obraznih biometričnih modelov s kontrastnim multimodalnim učenjem Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: CNN, deep learning, face recognition, xai

Brodarič, Marko; Peer, Peter; Struc, Vitomir

Towards Improving Backbones for Deepfake Detection Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, 2024.

BibTeX | Tags: CNN, deep learning, deepfake detection, deepfakes, media forensics, transformer

Sikošek, Lovro; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut

Detection of Presentation Attacks with 3D Masks Using Deep Learning Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, face PAD, face recognition, pad

Alessio, Leon; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut

Prepoznava zamenjave obraza na slikah osebnih dokumentov Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: biometrics, deep learning, deep models, face PAD, face recognition, pad

Plesh, Richard; Križaj, Janez; Bahmani, Keivan; Banavar, Mahesh; Struc, Vitomir; Schuckers, Stephanie

Discovering Interpretable Feature Directions in the Embedding Space of Face Recognition Models Proceedings Article

In: International Joint Conference on Biometrics (IJCB 2024), pp. 1-10, 2024.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, face recognition, feature space understanding, xai

@inproceedings{Krizaj,

title = {Discovering Interpretable Feature Directions in the Embedding Space of Face Recognition Models},

author = {Richard Plesh and Janez Križaj and Keivan Bahmani and Mahesh Banavar and Vitomir Struc and Stephanie Schuckers},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/08/107.pdf

https://lmi.fe.uni-lj.si/wp-content/uploads/2024/08/107-supp.pdf},

year  = {2024},

date = {2024-09-15},

booktitle = {International Joint Conference on Biometrics (IJCB 2024)},

pages = {1-10},

abstract = {Modern face recognition (FR) models, particularly their convolutional neural network based implementations, often raise concerns regarding privacy and ethics due to their “black-box” nature. To enhance the explainability of FR models and the interpretability of their embedding space, we introduce in this paper three novel techniques for discovering semantically meaningful feature directions (or axes). The first technique uses a dedicated facial-region blending procedure together with principal component analysis to discover embedding space direction that correspond to spatially isolated semantic face areas, providing a new perspective on facial feature interpretation. The other two proposed techniques exploit attribute labels to discern feature directions that correspond to intra-identity variations, such as pose, illumination angle, and expression, but do so either through a cluster analysis or a dedicated regression procedure. To validate the capabilities of the developed techniques, we utilize a powerful template decoder that inverts the image embedding back into the pixel space. Using the decoder, we visualize linear movements along the discovered directions, enabling a clearer understanding of the internal representations within face recognition models. The source code will be made publicly available.},

keywords = {biometrics, CNN, deep learning, face recognition, feature space understanding, xai},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

DeAndres-Tame, Ivan; Tolosana, Ruben; Melzi, Pietro; Vera-Rodriguez, Ruben; Kim, Minchul; Rathgeb, Christian; Liu, Xiaoming; Morales, Aythami; Fierrez, Julian; Ortega-Garcia, Javier; Zhong, Zhizhou; Huang, Yuge; Mi, Yuxi; Ding, Shouhong; Zhou, Shuigeng; He, Shuai; Fu, Lingzhi; Cong, Heng; Zhang, Rongyu; Xiao, Zhihong; Smirnov, Evgeny; Pimenov, Anton; Grigorev, Aleksei; Timoshenko, Denis; Asfaw, Kaleb Mesfin; Low, Cheng Yaw; Liu, Hao; Wang, Chuyi; Zuo, Qing; He, Zhixiang; Shahreza, Hatef Otroshi; George, Anjith; Unnervik, Alexander; Rahimi, Parsa; Marcel, Sébastien; Neto, Pedro C; Huber, Marco; Kolf, Jan Niklas; Damer, Naser; Boutros, Fadi; Cardoso, Jaime S; Sequeira, Ana F; Atzori, Andrea; Fenu, Gianni; Marras, Mirko; Štruc, Vitomir; Yu, Jiang; Li, Zhangjie; Li, Jichun; Zhao, Weisong; Lei, Zhen; Zhu, Xiangyu; Zhang, Xiao-Yu; Biesseck, Bernardo; Vidal, Pedro; Coelho, Luiz; Granada, Roger; Menotti, David

Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data Proceedings Article

In: Proceedings of CVPR Workshops (CVPRW 2024), pp. 1-11, 2024.

Abstract | Links | BibTeX | Tags: competition, face, face recognition, synthetic data

@inproceedings{CVPR_synth2024,

title = {Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data},

author = {Ivan DeAndres-Tame and Ruben Tolosana and Pietro Melzi and Ruben Vera-Rodriguez and Minchul Kim and Christian Rathgeb and Xiaoming Liu and Aythami Morales and Julian Fierrez and Javier Ortega-Garcia and Zhizhou Zhong and Yuge Huang and Yuxi Mi and Shouhong Ding and Shuigeng Zhou and Shuai He and Lingzhi Fu and Heng Cong and Rongyu Zhang and Zhihong Xiao and Evgeny Smirnov and Anton Pimenov and Aleksei Grigorev and Denis Timoshenko and Kaleb Mesfin Asfaw and Cheng Yaw Low and Hao Liu and Chuyi Wang and Qing Zuo and Zhixiang He and Hatef Otroshi Shahreza and Anjith George and Alexander Unnervik and Parsa Rahimi and Sébastien Marcel and Pedro C Neto and Marco Huber and Jan Niklas Kolf and Naser Damer and Fadi Boutros and Jaime S Cardoso and Ana F Sequeira and Andrea Atzori and Gianni Fenu and Mirko Marras and Vitomir Štruc and Jiang Yu and Zhangjie Li and Jichun Li and Weisong Zhao and Zhen Lei and Xiangyu Zhu and Xiao-Yu Zhang and Bernardo Biesseck and Pedro Vidal and Luiz Coelho and Roger Granada and David Menotti},

url = {https://openaccess.thecvf.com/content/CVPR2024W/FRCSyn/papers/Deandres-Tame_Second_Edition_FRCSyn_Challenge_at_CVPR_2024_Face_Recognition_Challenge_CVPRW_2024_paper.pdf},

year  = {2024},

date = {2024-06-17},

urldate = {2024-06-17},

booktitle = {Proceedings of CVPR Workshops (CVPRW 2024)},

pages = {1-11},

abstract = {Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intraclass variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at CVPR 2024. FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations, including data privacy concerns, demographic biases, generalization to novel scenarios, and performance constraints in challenging situations such as aging, pose variations, and occlusions. Unlike the 1st edition, in which synthetic data from DCFace and GANDiffFace methods was only allowed to train face recognition systems, in this 2nd edition we propose new subtasks that allow participants to explore novel face generative methods. The outcomes of the 2nd FRCSyn Challenge, along with the proposed experimental protocol and benchmarking contribute significantly to the application of synthetic data to face recognition.},

keywords = {competition, face, face recognition, synthetic data},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Rot, Peter; Terhorst, Philipp; Peer, Peter; Štruc, Vitomir

ASPECD: Adaptable Soft-Biometric Privacy-Enhancement Using Centroid Decoding for Face Verification Proceedings Article

In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 1-9, 2024.

Abstract | Links | BibTeX | Tags: deepfake, deepfakes, face, face analysis, face deidentification, face image processing, face images, face synthesis, face verification, privacy, privacy enhancement, privacy protection, privacy-enhancing techniques, soft biometric privacy, soft biometrics

@inproceedings{Rot_FG2024,

title = {ASPECD: Adaptable Soft-Biometric Privacy-Enhancement Using Centroid Decoding for Face Verification},

author = {Peter Rot and Philipp Terhorst and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/03/PeterRot_FG2024.pdf},

year  = {2024},

date = {2024-05-28},

booktitle = {Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG)},

pages = {1-9},

abstract = {State-of-the-art face recognition models commonly extract information-rich biometric templates from the input images that are then used for comparison purposes and identity inference. While these templates encode identity information in a highly discriminative manner, they typically also capture other potentially sensitive facial attributes, such as age, gender or ethnicity. To address this issue, Soft-Biometric Privacy-Enhancing Techniques (SB-PETs) were proposed in the literature that aim to suppress such attribute information, and, in turn, alleviate the privacy risks associated with the extracted biometric templates. While various SB-PETs were presented so far, existing   approaches do not provide dedicated mechanisms to determine which soft-biometrics to exclude and which to retain. In this paper, we address this gap and introduce ASPECD, a modular framework designed to selectively suppress binary and categorical soft-biometrics based on users' privacy preferences. ASPECD consists of multiple sequentially connected components, each dedicated for privacy-enhancement of an individual soft-biometric attribute.  The proposed framework suppresses attribute information using a Moment-based Disentanglement process coupled with a centroid decoding procedure, ensuring that the privacy-enhanced templates are directly comparable to the templates in the original embedding space, regardless of the soft-biometric modality being suppressed. 

To validate the performance of ASPECD, we conduct experiments on a large-scale face dataset and with five state-of-the-art face recognition models, demonstrating the effectiveness of the proposed approach in suppressing single and multiple soft-biometric attributes. Our approach achieves a competitive privacy-utility trade-off compared to the state-of-the-art methods in scenarios that involve enhancing privacy w.r.t. gender and ethnicity attributes. Source code will be made publicly available.},

keywords = {deepfake, deepfakes, face, face analysis, face deidentification, face image processing, face images, face synthesis, face verification, privacy, privacy enhancement, privacy protection, privacy-enhancing techniques, soft biometric privacy, soft biometrics},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Lampe, Ajda; Stopar, Julija; Jain, Deepak Kumar; Omachi, Shinichiro; Peer, Peter; Struc, Vitomir

DiCTI: Diffusion-based Clothing Designer via Text-guided Input Proceedings Article

In: Proceedings of the18th International Conference on Automatic Face and Gesture Recognition (FG 2024), pp. 1-9, 2024.

Abstract | Links | BibTeX | Tags: clothing design, deepbeauty, denoising diffusion probabilistic models, diffusion, diffusion models, fashion, virtual try-on

@inproceedings{Ajda_Dicti,

title = {DiCTI: Diffusion-based Clothing Designer via Text-guided Input},

author = {Ajda Lampe and Julija Stopar and Deepak Kumar Jain and Shinichiro Omachi and Peter Peer and Vitomir Struc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/06/Dicti_FG2024_compressed.pdf},

year  = {2024},

date = {2024-05-27},

booktitle = {Proceedings of the18th International Conference on Automatic Face and Gesture Recognition (FG 2024)},

pages = {1-9},

abstract = {Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on applications, there has been relatively less focus on facilitating fast prototyping for designers and customers seeking to order new designs. To address this gap, we introduce DiCTI (Diffusion-based Clothing Designer via Text-guided Input), a straightforward yet highly effective approach that allows designers to quickly visualize fashion-related ideas using text inputs only. 

Given an image of a person and a description of the desired garments as input, DiCTI automatically generates multiple high-resolution, photorealistic images that capture the expressed semantics.  

By leveraging a powerful diffusion-based inpainting model conditioned on text inputs, DiCTI is able to synthesize convincing, high-quality images with varied clothing designs that viably follow the provided text descriptions, while being able to process very diverse and challenging inputs, captured in completely unconstrained settings. We evaluate DiCTI in comprehensive experiments on two different datasets (VITON-HD and Fashionpedia) and in comparison to the state-of-the-art (SoTa). The results of our experiments show that DiCTI convincingly outperforms the SoTA competitor in generating higher quality images with more elaborate garments and superior text prompt adherence, both according to standard quantitative evaluation measures and human ratings, generated as part of a user study. The source code of DiCTI will be made publicly available.},

keywords = {clothing design, deepbeauty, denoising diffusion probabilistic models, diffusion, diffusion models, fashion, virtual try-on},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Tomašević, Darian; Boutros, Fadi; Damer, Naser; Peer, Peter; Štruc, Vitomir

Generating bimodal privacy-preserving data for face recognition Journal Article

In: Engineering Applications of Artificial Intelligence, vol. 133, iss. E, pp. 1-25, 2024.

Abstract | Links | BibTeX | Tags: CNN, face, face generation, face images, face recognition, generative AI, StyleGAN2, synthetic data

@article{Darian2024,

title = {Generating bimodal privacy-preserving data for face recognition},

author = {Darian Tomašević and Fadi Boutros and Naser Damer and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/05/PapersDarian.pdf},

doi = {https://doi.org/10.1016/j.engappai.2024.108495},

year  = {2024},

date = {2024-05-01},

journal = {Engineering Applications of Artificial Intelligence},

volume = {133},

issue = {E},

pages = {1-25},

abstract = {The performance of state-of-the-art face recognition systems depends crucially on the availability of large-scale training datasets. However, increasing privacy concerns nowadays accompany the collection and distribution of biometric data, which has already resulted in the retraction of valuable face recognition datasets. The use of synthetic data represents a potential solution, however, the generation of privacy-preserving facial images useful for training recognition models is still an open problem. Generative methods also remain bound to the visible spectrum, despite the benefits that multispectral data can provide. To address these issues, we present a novel identity-conditioned generative framework capable of producing large-scale recognition datasets of visible and near-infrared privacy-preserving face images. The framework relies on a novel identity-conditioned dual-branch style-based generative adversarial network to enable the synthesis of aligned high-quality samples of identities determined by features of a pretrained recognition model. In addition, the framework incorporates a novel filter to prevent samples of privacy-breaching identities from reaching the generated datasets and improve both identity separability and intra-identity diversity. Extensive experiments on six publicly available datasets reveal that our framework achieves competitive synthesis capabilities while preserving the privacy of real-world subjects. The synthesized datasets also facilitate training more powerful recognition models than datasets generated by competing methods or even small-scale real-world datasets. Employing both visible and near-infrared data for training also results in higher recognition accuracy on real-world visible spectrum benchmarks. Therefore, training with multispectral data could potentially improve existing recognition systems that utilize only the visible spectrum, without the need for additional sensors.},

keywords = {CNN, face, face generation, face images, face recognition, generative AI, StyleGAN2, synthetic data},

pubstate = {published},

tppubtype = {article}

}

Close

Tomašević, Darian; Peer, Peter; Štruc, Vitomir

BiFaceGAN: Bimodal Face Image Synthesis Book Section

In: Bourlai, T. (Ed.): Face Recognition Across the Imaging Spectrum, pp. 273–311, Springer, Singapore, 2024, ISBN: 978-981-97-2058-3.

Abstract | Links | BibTeX | Tags: CNN, deep learning, face synthesis, generative AI, stlyegan

@incollection{Darian2024Book,

title = {BiFaceGAN: Bimodal Face Image Synthesis},

author = {Darian Tomašević and Peter Peer and Vitomir Štruc},

editor = {T. Bourlai},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/11/BiFaceGAN.pdf},

doi = {https://doi.org/10.1007/978-981-97-2059-0_11},

isbn = {978-981-97-2058-3},

year  = {2024},

date = {2024-05-01},

urldate = {2024-05-01},

booktitle = {Face Recognition Across the Imaging Spectrum},

pages = {273–311},

publisher = {Springer, Singapore},

abstract = {Modern face recognition and segmentation systems, such as all deep learning approaches, rely on large-scale annotated datasets to achieve competitive performance. However, gathering biometric data often raises privacy concerns and presents a labor-intensive and time-consuming task. Researchers are currently also exploring the use of multispectral data to improve existing solutions, limited to the visible spectrum. Unfortunately, the collection of suitable data is even more difficult, especially if aligned images are required. To address the outlined issues, we present a novel synthesis framework, named BiFaceGAN, capable of producing privacy-preserving large-scale synthetic datasets of photorealistic face images, in the visible and the near-infrared spectrum, along with corresponding ground-truth pixel-level annotations. The proposed framework leverages an innovative Dual-Branch Style-based generative adversarial network (DB-StyleGAN2) to generate per-pixel-aligned bimodal images, followed by an ArcFace Privacy Filter (APF) that ensures the removal of privacy-breaching images. Furthermore, we also implement a Semantic Mask Generator (SMG) that produces reference ground-truth segmentation masks of the synthetic data, based on the latent representations inside the synthesis model and only a handful of manually labeled examples. We evaluate the quality of generated images and annotations through a series of experiments and analyze the benefits of generating bimodal data with a single network. We also show that privacy-preserving data filtering does not notably degrade the image quality of produced datasets. Finally, we demonstrate that the generated data can be employed to train highly successful deep segmentation models, which can generalize well to other real-world datasets.},

keywords = {CNN, deep learning, face synthesis, generative AI, stlyegan},

pubstate = {published},

tppubtype = {incollection}

}

Close

Babnik, Žiga; Boutros, Fadi; Damer, Naser; Peer, Peter; Štruc, Vitomir

AI-KD: Towards Alignment Invariant Face Image Quality Assessment Using Knowledge Distillation Proceedings Article

In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), pp. 1-6, 2024.

Abstract | Links | BibTeX | Tags: ai, CNN, deep learning, face, face image quality assessment, face image quality estimation, face images, face recognition, face verification