2026 |
Ivanovska, Marija; Todorov, Leon; Peer, Peter; Štruc, Vitomir SelfMAD++: Self-Supervised Foundation Model with Local Feature Enhancement for Generalized Morphing Attack Detection Članek v strokovni reviji V: Information Fusion, vol. 127, Part C, no. 103921, str. 1-16, 2026. Povzetek | Povezava | BibTeX | Oznake: anomaly detection, biometrics, CLIP, computer vision, face morphing detection, face recognition, foundation models @article{InfoFUS_Marija,Face morphing attacks pose a growing threat to biometric systems, exacerbated by the rapid emergence of powerful generative techniques that enable realistic and seamless facial image manipulations. To address this challenge, we introduce SelfMAD++, a robust and generalized single-image morphing attack detection (S-MAD) framework. Unlike our previous work SelfMAD, which introduced a data augmentation technique to train off-the-shelf classifiers for attack detection, SelfMAD++ advances this paradigm by integrating the artifact-driven augmentation with foundation models and fine-grained spatial reasoning. At its core, SelfMAD++ builds on CLIP—a vision-language foundation model—adapted via Low-Rank Adaptation (LoRA) to align image representations with task-specific text prompts. To enhance sensitivity to spatially subtle and fine-grained artifacts, we integrate a parallel multi-scale convolutional branch specialized in dense, multi-scale feature extraction. This branch is guided by an auxiliary segmentation module, which acts as a regularizer by disentangling bona fide facial regions from potentially manipulated ones. The dual-branch features are adaptively fused through a gated attention mechanism, capturing both semantic context and fine-grained spatial cues indicative of morphing. SelfMAD++ is trained end-to-end using a multi-objective loss that balances semantic alignment, segmentation consistency, and classification accuracy. Extensive experiments across nine standard benchmark datasets demonstrate that SelfMAD++ achieves state-of-the-art performance, with an average Equal Error Rate (EER) of 3.91%, outperforming both supervised and unsupervised MAD methods by large margins. Notably, SelfMAD++ excels on modern, high-quality morphs generated by GAN and diffusion--based morphing methods, demonstrating its robustness and strong generalization capability. SelfMAD++ code and supplementary resources are publicly available at: https://github.com/LeonTodorov/SelfMADpp. |
2025 |
Manojlovska, Anastasija; Ramachandra, Raghavendra; Spathoulas, Georgios; Struc, Vitomir; Grm, Klemen Interpreting Face Recognition Templates using Natural Language Descriptions Proceedings Article V: Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025, str. 1-10, Tucson, USA, 2025. Povzetek | Povezava | BibTeX | Oznake: CLIP, explainability, face recognition, natural language, symbolic representations, xai @inproceedings{Anastasija_WACV25,Explainable artificial intelligence (XAI) aims to ensure an AI system's decisions are transparent and understandable by humans, which is particularly important in potentially sensitive application scenarios in surveillance, security and law enforcement. In these and related areas, understanding the internal mechanisms governing the decision-making process of AI-based systems can increase trust and consequently user acceptance. While various methods have been developed to provide insights into the behavior of AI-based models, solutions capable of explaining different aspects of the models using Natural Language are still limited in the literature. In this paper, we therefore propose a novel approach for interpreting the information content encoded in face templates, produced by state-of-the-art (SOTA) face recognition models. Specifically, we utilize the Text Encoder from the Contrastive Language-Image Pretraining (CLIP) model and generate natural language descriptions of various face attributes present in the face templates. We implement two versions of our approach, with the off-the-shelf CLIP text-encoder and a fine-tuned version using the VGGFace2 and MAADFace datasets.~Our experimental results indicate that the fine-tuned text encoder under the contrastive training paradigm increases the attribute-based explainability of face recognition templates, while both models provide valuable human-understandable insights into modern face recognition models. |
Objave
2026 |
SelfMAD++: Self-Supervised Foundation Model with Local Feature Enhancement for Generalized Morphing Attack Detection Članek v strokovni reviji V: Information Fusion, vol. 127, Part C, no. 103921, str. 1-16, 2026. |
2025 |
Interpreting Face Recognition Templates using Natural Language Descriptions Proceedings Article V: Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025, str. 1-10, Tucson, USA, 2025. |