Publications – Laboratory for Machine Intelligence

Babnik, Žiga; Peer, Peter; Štruc, Vitomir

UVFace: Utility Driven Video-based Face Recognition Journal Article

In: ICT Express, pp. 1–6, 2026.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, face image quality assessment, face images, face recognition, video based recognition

Kolf, Jan Niklas; Ozgur, Guray; Atzori, Andrea; Babnik, Žiga; Štruc, Vitomir; Damer, Naser; Boutros, Fadi

PreFIQs: Face Image Quality Is What Survives Pruning Proceedings Article

In: Proceedings of CVPR Workshops 2026 - CVPR Biometrics Workshop, pp. 1–11, 2026.

Abstract | Links | BibTeX | Tags: biometrics, deep learning, face image quality assessment, face quality, face recognition, FIQA

@inproceedings{PreFIQCVPRW,

title = {PreFIQs: Face Image Quality Is What Survives Pruning},

author = {Jan Niklas Kolf and Guray Ozgur and Andrea Atzori and Žiga Babnik and Vitomir Štruc and Naser Damer and Fadi Boutros},

url = {https://openaccess.thecvf.com/content/CVPR2026W/BIOM2026/papers/Kolf_PreFIQs_Face_Image_Quality_Is_What_Survives_Pruning_CVPRW_2026_paper.pdf},

year  = {2026},

date = {2026-06-06},

booktitle = {Proceedings of CVPR Workshops 2026 - CVPR Biometrics Workshop},

pages = {1--11},

abstract = {Face Image Quality Assessment (FIQA) evaluates the utility of a face image for automated face recognition (FR) systems. In this work, we propose PreFIQs, an unsupervised and training-free FIQA framework grounded in the Pruning Identified Exemplar (PIE) hypothesis. We hypothesize that low-utility face images rely disproportionately on fragile network parameters, resulting in larger geometric displacement of their embeddings under model sparsification. Accordingly, PreFIQs quantifies image utility as the Euclidean distance between L2-normalized embeddings extracted from a pre-trained FR model and its pruned counterpart. We provide a first-order theoretical justification via a Jacobian-vector product analysis, demonstrating that this empirical drift serves as a computationally efficient approximation of the exact geometric sensitivity of the latent embedding manifold. Extensive experiments across eight benchmarks and four FR models demonstrate that PreFIQs achieves competitive or superior performance compared to state-of-the-art FIQA methods, including establishing new state-of-the-art results on several benchmarks, without any training or supervision. These results validate parameter sparsification as a principled and practically efficient signal for face image utility, and demonstrate that quality is, in essence, what survives pruning. Code available at https://github.com/jankolf/PreFIQs.},

keywords = {biometrics, deep learning, face image quality assessment, face quality, face recognition, FIQA},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Larue, Nicolas; Štruc, Vitomir; Peer, Peter; Vu, Ngoc-Son

Learning the Manifold of Authenticity: Hybrid-Curvature Representation Learning for Generalizable Deepfake Detection Journal Article

In: IEEE Access, pp. 1–14, 2026, ISBN: 2169-3536.

Abstract | Links | BibTeX | Tags: deep learning, deepfake, deepfake DAD, deepfake detection, hyperbolic learning, media forensics

@article{AccessHyperbolic,

title = {Learning the Manifold of Authenticity: Hybrid-Curvature Representation Learning for Generalizable Deepfake Detection},

author = {Nicolas Larue and Vitomir Štruc and Peter Peer and Ngoc-Son Vu},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11557307},

doi = {10.1109/ACCESS.2026.3702429},

isbn = {2169-3536},

year  = {2026},

date = {2026-06-06},

journal = {IEEE Access},

pages = {1--14},

abstract = {The practical utility of deepfake detectors is crippled by a crisis of generalization: models that perform well on known manipulation techniques consistently fail when faced with unseen forgeries.We argue this failure stems from a fundamental geometric mismatch. Existing methods implicitly assume that the manifold of authentic faces can be modeled in a space of uniform curvature, typically Euclidean, which inade-quately captures the complex, multi-scale structure of facial features. This paper validates the hypothesis that authentic faces lie on a manifold whose geometry is inherently hybrid, requiring both angular compactness (a spherical property) and hierarchical organization (a hyperbolic property). To resolve this geometric mismatch, we introduce a novel detector, CTrue, that learns a unified, hybrid-curvature representation of facial authenticity. Trained exclusively on real faces via self-supervised learning, our method simultaneously projects facial embeddings onto two complementary manifolds: a hypersphere to enforce compactness and a hyperbolic space to model the natural feature hierarchy. A single set of mathematically-optimal prototypes acts as a ‘‘geometric bridge’’, unifying the learning objectives in both spaces. At inference, a composite score measures an embedding’s deviation from this learned manifold. On challenging cross-dataset and cross-manipulation benchmarks, our method achieves competitive generalization under a strictly pristine-only training setting, showing that hybrid-curvature representations provide an effective and data-efficient alternative for deepfake detection.},

keywords = {deep learning, deepfake, deepfake DAD, deepfake detection, hyperbolic learning, media forensics},

pubstate = {published},

tppubtype = {article}

}

Close

Zhu, Haini; Jain, Deepak Kumar; Zhao, Xudong; Li, Muyu; Štruc, Vitomir; Tyagi, Sumarga Kumar Sah

StructFormer: Structure-Consistent Face De-Identification under Strong Privacy Constraints Proceedings Article

In: WACV-W 2026, pp. 1–11, 2026.

Abstract | Links | BibTeX | Tags: deep learning, deidentification, face analysis

Gan, Chenquan; Zhou, Daitao; Zhu, Qingyi; Wang, Xibin; Jain, Deepak Kumar; Štruc, Vitomir

Improving Emotion Recognition from Ambiguous Speech via Spatio-Temporal Spectrum Analysis and Real-Time Soft-Label Correction Journal Article

In: IEEE Transactions on Affective Computing, pp. 1-16, 2026.

Abstract | Links | BibTeX | Tags: deep learning, emotion recognition, speech, speech processing

@article{TAC_2026,

title = {Improving Emotion Recognition from Ambiguous Speech via Spatio-Temporal Spectrum Analysis and Real-Time Soft-Label Correction},

author = {Chenquan Gan and Daitao Zhou and Qingyi Zhu and Xibin Wang and Deepak Kumar Jain and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/12/Manuscript_clean.pdf},

year  = {2026},

date = {2026-03-01},

urldate = {2026-03-01},

journal = {IEEE Transactions on Affective Computing},

pages = {1-16},

abstract = {Speech represents a fundamental medium for conveying human emotions and, as a result, speech-based emotion recognition (SER) systems have become pivotal in advancing human-computer interaction (HCI) across a range of applications. While significant progress has been made in speech emotion recognition over recent years, existing solutions still face several key challenges, in that they: (i)  rely excessively  on subjectively annotated (discrete) labels during training, (ii)  often overlook the label ambiguity of speech samples that express more than one class of emotions, and (iii)  underutilize unlabeled or ambiguous speech, for which typically a label distribution (or so-called soft labels) is available. To address these issues, we propose in this paper a novel SER model that explicitly handles ambiguous  speech samples and overcomes the shortcomings outlined above. Central to our approach is a novel real-time soft-label correction strategy designed to refine the annotations assigned to ambiguous speech. The proposed model leverages both, (explicitly) labeled as well as ambiguous samples and applies the dynamic soft-label correction strategy alongside an enhanced inter-class difference loss function to iteratively optimize the label distributions during training. We theoretically demonstrate that our method is capable of approximating the true emotional distribution of speech even in the presence of label noise, suggesting that utilizing ambiguous speech samples without explicit emotion labels still contributes toward more effective emotion recognition. Furthermore, we integrate the representational power of convolutional neural networks (CNNs) with the contextual modeling capabilities of Wav2Vec 2.0 to enable a comprehensive extraction of spatio-temporal speech features. Experimental results on the IEMOCAP multi-label dataset confirm the effectiveness of our approach, achieving state-of-the-art performance with significant improvements in weighted accuracy (WA) and unweighted accuracy (UA) over competing methods.},

keywords = {deep learning, emotion recognition, speech, speech processing},

pubstate = {published},

tppubtype = {article}

}

Close

Speech represents a fundamental medium for conveying human emotions and, as a result, speech-based emotion recognition (SER) systems have become pivotal in advancing human-computer interaction (HCI) across a range of applications. While significant progress has been made in speech emotion recognition over recent years, existing solutions still face several key challenges, in that they: (i) rely excessively on subjectively annotated (discrete) labels during training, (ii) often overlook the label ambiguity of speech samples that express more than one class of emotions, and (iii) underutilize unlabeled or ambiguous speech, for which typically a label distribution (or so-called soft labels) is available. To address these issues, we propose in this paper a novel SER model that explicitly handles ambiguous speech samples and overcomes the shortcomings outlined above. Central to our approach is a novel real-time soft-label correction strategy designed to refine the annotations assigned to ambiguous speech. The proposed model leverages both, (explicitly) labeled as well as ambiguous samples and applies the dynamic soft-label correction strategy alongside an enhanced inter-class difference loss function to iteratively optimize the label distributions during training. We theoretically demonstrate that our method is capable of approximating the true emotional distribution of speech even in the presence of label noise, suggesting that utilizing ambiguous speech samples without explicit emotion labels still contributes toward more effective emotion recognition. Furthermore, we integrate the representational power of convolutional neural networks (CNNs) with the contextual modeling capabilities of Wav2Vec 2.0 to enable a comprehensive extraction of spatio-temporal speech features. Experimental results on the IEMOCAP multi-label dataset confirm the effectiveness of our approach, achieving state-of-the-art performance with significant improvements in weighted accuracy (WA) and unweighted accuracy (UA) over competing methods.

Close

Mishra, Gargi; Bajpai, Supriya; Saini, Dharmender; Jain, Rachna; Jain, Deepak Kumar; Štruc, Vitomir

EmoVisioNet: A hybrid network unifying lightweight CNN and attention-based vision model for facial emotion detection Journal Article

In: Neurocomputing, vol. 665, no. 132224, pp. 1-12, 2026.

Abstract | Links | BibTeX | Tags: CNN, deep learning, facial expression recognition, lightweight models

@article{EmoVison2025,

title = {EmoVisioNet: A hybrid network unifying lightweight CNN and attention-based vision model for facial emotion detection},

author = {Gargi Mishra and Supriya Bajpai and Dharmender Saini and Rachna Jain and Deepak Kumar Jain and Vitomir Štruc },

doi = {https://doi.org/10.1016/j.neucom.2025.132224},

year  = {2026},

date = {2026-02-07},

urldate = {2025-11-28},

journal = {Neurocomputing},

volume = {665},

number = {132224},

pages = {1-12},

abstract = {Facial emotion detection has witnessed a surge in demand across numerous applications, including human-computer interaction, healthcare, and security. Accurate expression recognition is crucial for improving human-computer interactions and understanding human behavior. Existing facial emotion detection models face challenges in achieving both high accuracy and real-time processing due to complex architectures. Our goal is to create an efficient yet accurate solution that can work on resource-constrained devices. To address the challenge of accurately recognizing emotions from facial expressions, we propose a novel hybrid approach that combines the strengths of pretrained Lightweight Convolutional Neural Networks (CNN), and Attention-based Vision Models. The pretrained Lightweight CNN serves as a feature extractor, efficiently capturing facial features, while the attention model refines the feature representation to focus on crucial regions of the face associated with different expressions. This enables our model to achieve state-of-the-art (SOTA) accuracy with reduced computational requirements. The proposed model, EmoVisioNet, achieves superior performance across multiple datasets, attaining 99.97 % accuracy on CK+, 96.23 % on RAF-DB, 93.88 % on FER2013, and 96.91 % on FERPlus. The obtained results surpass the current state-of-the-art in this field, demonstrating the EmoVisioNet’s superior performance in facial expression recognition.},

keywords = {CNN, deep learning, facial expression recognition, lightweight models},

pubstate = {published},

tppubtype = {article}

}

Close

Rot, Peter; Jutreša, Robert; Peer, Peter; Štruc, Vitomir; Scheirer, Walter; Grm, Klemen

FaceMINT: A library for gaining insights into biometric face recognition via mechanistic interpretability Journal Article

In: Image and Vision Computing, no. 105804, pp. 1-23, 2025.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, face recognition, interpretability, MIXBAI, xai

@article{Rot_IVC2025,

title = {FaceMINT: A library for gaining insights into biometric face recognition via mechanistic interpretability},

author = {Peter Rot and Robert Jutreša and Peter Peer and Vitomir Štruc and Walter Scheirer and Klemen Grm},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/11/IVC_2025__FaceMINT.pdf

},

doi = {https://doi.org/10.1016/j.imavis.2025.105804},

year  = {2025},

date = {2025-11-10},

journal = {Image and Vision Computing},

number = {105804},

pages = {1-23},

abstract = {Deep-learning models, including those used in biometric recognition, have achieved remarkable performance on benchmark datasets as well as real-world recognition tasks. However, a major drawback of these models is their lack of transparency in decision-making. Mechanistic interpretability has emerged as a promising research field intended to help us gain insights into such models, but its application to biometric data remains limited. In this work, we bridge this gap by introducing the FaceMINT library, a publicly available Python library (build on top of Pytorch) that enables biometric researchers to inspect their models through mechanistic interpretability. It provides a plug-and-play solution that allows researchers to seamlessly switch between the analyzed biometric models, evaluate state-of-the-art sparse autoencoders, select from various image parametrizations, and fine-tune hyperparameters. Using a large scale Glint360K dataset, we demonstrate the usability of FaceMINT by applying its functionality to two state-of-the-art (deep-learning) face recognition models: AdaFace, based on Convolutional Neural Networks (CNN), and SwinFace, based on transformers. The proposed library implements various sparse auto-encoders (SAEs), including vanilla SAE, Gated SAE, JumpReLU SAE, and TopK SAE, which have achieved state-of-the-art results in the mechanistic interpretability of large language models. Our study highlights the promise of mechanistic interpretability in the biometric field, providing new avenues for researchers to explore model transparency and refine biometric recognition systems. The library is publicly available at www.gitlab.com/peterrot/facemint.},

keywords = {biometrics, CNN, deep learning, face recognition, interpretability, MIXBAI, xai},

pubstate = {published},

tppubtype = {article}

}

Close

Gan, Chenquan; Zhou, Daitao; Wang, Kexin; Zhu, Qingyi; Jain, Deepak Kumar; Štruc, Vitomir

Optimizing ambiguous speech emotion recognition through spatial–temporal parallel network with label correction strategy Journal Article

In: Computer Vision and Image Understanding, vol. 260, no. 104483, pp. 1–14, 2025.

Abstract | Links | BibTeX | Tags: deep learning, emotion recognition, speech, speech processing, speech technologies

@article{CVIU_2025b,

title = {Optimizing ambiguous speech emotion recognition through spatial–temporal parallel network with label correction strategy},

author = {Chenquan Gan and Daitao Zhou and Kexin Wang and Qingyi Zhu and Deepak Kumar Jain and Vitomir Štruc},

url = {https://www.sciencedirect.com/science/article/pii/S1077314225002061?dgcid=coauthor

https://lmi.fe.uni-lj.si/wp-content/uploads/2025/09/CVIU.pdf},

doi = {https://doi.org/10.1016/j.cviu.2025.104483},

year  = {2025},

date = {2025-10-01},

urldate = {2025-10-01},

journal = {Computer Vision and Image Understanding},

volume = {260},

number = { 104483},

pages = {1--14},

abstract = {Speech emotion recognition is of great significance for improving the human–computer interaction experience. However, traditional methods based on hard labels have difficulty dealing with the ambiguity of emotional expression. Existing studies alleviate this problem by redefining labels, but still rely on the subjective emotional expression of annotators and fail to consider the truly ambiguous speech samples without dominant labels fully. To solve the problems of insufficient expression of emotional labels and ignoring ambiguous undominantly labeled speech samples, we propose a label correction strategy that uses a model with exact sample knowledge to modify inappropriate labels for ambiguous speech samples, integrating model training with emotion cognition, and considering the ambiguity without dominant label samples. It is implemented on a spatial–temporal parallel network, which adopts a temporal pyramid pooling (TPP) to process the variable-length features of speech to improve the recognition efficiency of speech emotion. Through experiments, it has been shown that ambiguous speech after label correction has a more promoting effect on the recognition performance of speech emotions.},

keywords = {deep learning, emotion recognition, speech, speech processing, speech technologies},

pubstate = {published},

tppubtype = {article}

}

Close

Ožbot, Miha; Škrjanc, Igor; Štruc, Vitomir

A Neuro-Fuzzy System for Interpretable Long-Term Stock Market Forecasting Proceedings Article

In: Proceedings of ERK 2025, pp. 213-216, 2025.

Abstract | Links | BibTeX | Tags: deep learning, forecasting, Fuzzformer, LSTM, stock market forecasting

Vitek, Matej; Tomašević, Darian; Das, Abhijit; Nathan, Sabari; Özbulak, Gökhan; Özbulak, Tataroğlu; Ayşe, Gözde; Calbimonte, Jean-Paul; Anjos, André; Bhatt, Hariohm Hemant; Premani, Dhruv Dhirendra; Chaudhari, Jay; Wang, Caiyong; Jiang, Jian; Zhang, Chi; Zhang, Qi; Ganapathi, Iyyakutti Iyappan; Ali, Syed Sadaf; Velayudan, Divya; Assefa, Maregu; Werghi, Naoufel; Daniels, Zachary A.; John, Leeon; Vyas, Ritesh; Khiarak, Jalil Nourmohammadi; Saeed, Taher Akbari; Nasehi, Mahsa; Kianfar, Ali; Pashazadeh Panahi, Mobina; Sharma, Geetanjali; Panth, Pushp Raj; Ramachandra, Raghavendra; Nigam, Aditya; Pal, Umapada; Peer, Peter; Štruc, Vitomir

Privacy-enhancing Sclera Segmentation Benchmarking Competition: SSBC 2025 Proceedings Article

In: Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2025), pp. 1–13, IEEE, 2025.

Links | BibTeX | Tags: biometrics, deep learning, sclera segmentation, segmentation, SSBC

Vitek, Matej; Štruc, Vitomir; Peer, Peter

GazeNet: A lightweight multitask sclera feature extractor Journal Article

In: Alexandria Engineering Journal, vol. 112, pp. 661-671, 2025.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, lightweight models, sclera

Boutros, Fadi; Štruc, Vitomir; Damer, Naser

AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition Proceedings Article

In: Proceedings of the European Conference on Computer Vision (ECCV 2024), pp. 1-20, 2024.

Abstract | Links | BibTeX | Tags: adaptive distillation, biometrics, CNN, deep learning, face, face recognition, knowledge distillation

Manojlovska, Anastasija; Štruc, Vitomir; Grm, Klemen

Interpretacija mehanizmov obraznih biometričnih modelov s kontrastnim multimodalnim učenjem Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: CNN, deep learning, face recognition, xai

Ocvirk, Krištof; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut

Primerjava metod za zaznavanje napadov ponovnega zajema Proceedings Article

In: Proceedings of ERK, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: attacks, biometrics, CNN, deep learning, identity cards, pad

Alessio, Leon; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut

Prepoznava zamenjave obraza na slikah osebnih dokumentov Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: biometrics, deep learning, deep models, face PAD, face recognition, pad

Sikošek, Lovro; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut

Detection of Presentation Attacks with 3D Masks Using Deep Learning Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, face PAD, face recognition, pad

Brodarič, Marko; Peer, Peter; Struc, Vitomir

Towards Improving Backbones for Deepfake Detection Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, 2024.

BibTeX | Tags: CNN, deep learning, deepfake detection, deepfakes, media forensics, transformer

Plesh, Richard; Križaj, Janez; Bahmani, Keivan; Banavar, Mahesh; Struc, Vitomir; Schuckers, Stephanie

Discovering Interpretable Feature Directions in the Embedding Space of Face Recognition Models Proceedings Article

In: International Joint Conference on Biometrics (IJCB 2024), pp. 1-10, 2024.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, face recognition, feature space understanding, xai

@inproceedings{Krizaj,

title = {Discovering Interpretable Feature Directions in the Embedding Space of Face Recognition Models},

author = {Richard Plesh and Janez Križaj and Keivan Bahmani and Mahesh Banavar and Vitomir Struc and Stephanie Schuckers},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/08/107.pdf

https://lmi.fe.uni-lj.si/wp-content/uploads/2024/08/107-supp.pdf},

year  = {2024},

date = {2024-09-15},

booktitle = {International Joint Conference on Biometrics (IJCB 2024)},

pages = {1-10},

abstract = {Modern face recognition (FR) models, particularly their convolutional neural network based implementations, often raise concerns regarding privacy and ethics due to their “black-box” nature. To enhance the explainability of FR models and the interpretability of their embedding space, we introduce in this paper three novel techniques for discovering semantically meaningful feature directions (or axes). The first technique uses a dedicated facial-region blending procedure together with principal component analysis to discover embedding space direction that correspond to spatially isolated semantic face areas, providing a new perspective on facial feature interpretation. The other two proposed techniques exploit attribute labels to discern feature directions that correspond to intra-identity variations, such as pose, illumination angle, and expression, but do so either through a cluster analysis or a dedicated regression procedure. To validate the capabilities of the developed techniques, we utilize a powerful template decoder that inverts the image embedding back into the pixel space. Using the decoder, we visualize linear movements along the discovered directions, enabling a clearer understanding of the internal representations within face recognition models. The source code will be made publicly available.},

keywords = {biometrics, CNN, deep learning, face recognition, feature space understanding, xai},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Tomašević, Darian; Peer, Peter; Štruc, Vitomir

BiFaceGAN: Bimodal Face Image Synthesis Book Section

In: Bourlai, T. (Ed.): Face Recognition Across the Imaging Spectrum, pp. 273–311, Springer, Singapore, 2024, ISBN: 978-981-97-2058-3.

Abstract | Links | BibTeX | Tags: CNN, deep learning, face synthesis, generative AI, stlyegan

@incollection{Darian2024Book,

title = {BiFaceGAN: Bimodal Face Image Synthesis},

author = {Darian Tomašević and Peter Peer and Vitomir Štruc},

editor = {T. Bourlai},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/11/BiFaceGAN.pdf},

doi = {https://doi.org/10.1007/978-981-97-2059-0_11},

isbn = {978-981-97-2058-3},

year  = {2024},

date = {2024-05-01},

urldate = {2024-05-01},

booktitle = {Face Recognition Across the Imaging Spectrum},

pages = {273–311},

publisher = {Springer, Singapore},

abstract = {Modern face recognition and segmentation systems, such as all deep learning approaches, rely on large-scale annotated datasets to achieve competitive performance. However, gathering biometric data often raises privacy concerns and presents a labor-intensive and time-consuming task. Researchers are currently also exploring the use of multispectral data to improve existing solutions, limited to the visible spectrum. Unfortunately, the collection of suitable data is even more difficult, especially if aligned images are required. To address the outlined issues, we present a novel synthesis framework, named BiFaceGAN, capable of producing privacy-preserving large-scale synthetic datasets of photorealistic face images, in the visible and the near-infrared spectrum, along with corresponding ground-truth pixel-level annotations. The proposed framework leverages an innovative Dual-Branch Style-based generative adversarial network (DB-StyleGAN2) to generate per-pixel-aligned bimodal images, followed by an ArcFace Privacy Filter (APF) that ensures the removal of privacy-breaching images. Furthermore, we also implement a Semantic Mask Generator (SMG) that produces reference ground-truth segmentation masks of the synthetic data, based on the latent representations inside the synthesis model and only a handful of manually labeled examples. We evaluate the quality of generated images and annotations through a series of experiments and analyze the benefits of generating bimodal data with a single network. We also show that privacy-preserving data filtering does not notably degrade the image quality of produced datasets. Finally, we demonstrate that the generated data can be employed to train highly successful deep segmentation models, which can generalize well to other real-world datasets.},

keywords = {CNN, deep learning, face synthesis, generative AI, stlyegan},

pubstate = {published},

tppubtype = {incollection}

}

Close

Babnik, Žiga; Boutros, Fadi; Damer, Naser; Peer, Peter; Štruc, Vitomir

AI-KD: Towards Alignment Invariant Face Image Quality Assessment Using Knowledge Distillation Proceedings Article

In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), pp. 1-6, 2024.

Abstract | Links | BibTeX | Tags: ai, CNN, deep learning, face, face image quality assessment, face image quality estimation, face images, face recognition, face verification

Rot, Peter; Križaj, Janez; Peer, Peter; Štruc, Vitomir

Enhancing Gender Privacy with Photo-realistic Fusion of Disentangled Spatial Segments Proceedings Article

In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5, 2024.

Links | BibTeX | Tags: deep learning, face, privacy, privacy enhancement, privacy protection, privacy-enhancing techniques, soft biometric privacy

Babnik, Žiga; Peer, Peter; Štruc, Vitomir

eDifFIQA: Towards Efficient Face Image Quality Assessment based on Denoising Diffusion Probabilistic Models Journal Article

In: IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM), pp. 1-16, 2024, ISSN: 2637-6407.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, DifFIQA, difussion, face, face image quality assesment, face recognition, FIQA

@article{BabnikTBIOM2024,

title = {eDifFIQA: Towards Efficient Face Image Quality Assessment based on Denoising Diffusion Probabilistic Models},

author = {Žiga Babnik and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/03/TBIOM___DifFIQAv2.pdf

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10468647&tag=1},

doi = {10.1109/TBIOM.2024.3376236},

issn = {2637-6407},

year  = {2024},

date = {2024-03-07},

urldate = {2024-03-07},

journal = {IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM)},

pages = {1-16},

abstract = {State-of-the-art Face Recognition (FR) models perform well in constrained scenarios, but frequently fail in difficult real-world scenarios, when no quality guarantees can be made for face samples. For this reason, Face Image Quality Assessment (FIQA) techniques are often used by FR systems, to provide quality estimates of captured face samples. The quality estimate provided by FIQA techniques can be used by the FR system to reject samples of low-quality, in turn improving the performance of the system and reducing the number of critical false-match errors. However, despite steady improvements, ensuring a good trade-off between the performance and computational complexity of FIQA methods across diverse face samples remains challenging. In this paper, we present DifFIQA, a powerful unsupervised approach for quality assessment based on the popular denoising diffusion probabilistic models (DDPMs) and the extended (eDifFIQA) approach. The main idea of the base DifFIQA approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because of the iterative nature of DDPMs the base DifFIQA approach is extremely computationally expensive. Using eDifFIQA we are able to improve on both the performance and computational complexity of the base DifFIQA approach, by employing label optimized knowledge distillation. In this process, quality information inferred by DifFIQA is distilled into a quality-regression model. During the distillation process, we use an additional source of quality information hidden in the relative position of the embedding to further improve the predictive capabilities of the underlying regression model. By choosing different feature extraction backbone models as the basis for the quality-regression eDifFIQA model, we are able to control the trade-off between the predictive capabilities and computational complexity of the final model. We evaluate three eDifFIQA variants of varying sizes in comprehensive experiments on 7 diverse datasets containing static-images and a separate video-based dataset, with 4 target CNN-based FR models and 2 target Transformer-based FR models and against 10 state-of-the-art FIQA techniques, as well as against the initial DifFIQA baseline and a simple regression-based predictor DifFIQA(R), distilled from DifFIQA without any additional optimization. The results show that the proposed label optimized knowledge distillation improves on the performance and computationally complexity of the base DifFIQA approach, and is able to achieve state-of-the-art performance in several distinct experimental scenarios. Furthermore, we also show that the distilled model can be used directly for face recognition and leads to highly competitive results.},

keywords = {biometrics, CNN, deep learning, DifFIQA, difussion, face, face image quality assesment, face recognition, FIQA},

pubstate = {published},

tppubtype = {article}

}

Close

State-of-the-art Face Recognition (FR) models perform well in constrained scenarios, but frequently fail in difficult real-world scenarios, when no quality guarantees can be made for face samples. For this reason, Face Image Quality Assessment (FIQA) techniques are often used by FR systems, to provide quality estimates of captured face samples. The quality estimate provided by FIQA techniques can be used by the FR system to reject samples of low-quality, in turn improving the performance of the system and reducing the number of critical false-match errors. However, despite steady improvements, ensuring a good trade-off between the performance and computational complexity of FIQA methods across diverse face samples remains challenging. In this paper, we present DifFIQA, a powerful unsupervised approach for quality assessment based on the popular denoising diffusion probabilistic models (DDPMs) and the extended (eDifFIQA) approach. The main idea of the base DifFIQA approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because of the iterative nature of DDPMs the base DifFIQA approach is extremely computationally expensive. Using eDifFIQA we are able to improve on both the performance and computational complexity of the base DifFIQA approach, by employing label optimized knowledge distillation. In this process, quality information inferred by DifFIQA is distilled into a quality-regression model. During the distillation process, we use an additional source of quality information hidden in the relative position of the embedding to further improve the predictive capabilities of the underlying regression model. By choosing different feature extraction backbone models as the basis for the quality-regression eDifFIQA model, we are able to control the trade-off between the predictive capabilities and computational complexity of the final model. We evaluate three eDifFIQA variants of varying sizes in comprehensive experiments on 7 diverse datasets containing static-images and a separate video-based dataset, with 4 target CNN-based FR models and 2 target Transformer-based FR models and against 10 state-of-the-art FIQA techniques, as well as against the initial DifFIQA baseline and a simple regression-based predictor DifFIQA(R), distilled from DifFIQA without any additional optimization. The results show that the proposed label optimized knowledge distillation improves on the performance and computationally complexity of the base DifFIQA approach, and is able to achieve state-of-the-art performance in several distinct experimental scenarios. Furthermore, we also show that the distilled model can be used directly for face recognition and leads to highly competitive results.

Close

Ivanovska, Marija; Štruc, Vitomir

Y-GAN: Learning Dual Data Representations for Anomaly Detection in Images Journal Article

In: Expert Systems with Applications (ESWA), vol. 248, no. 123410, pp. 1-7, 2024.

Abstract | Links | BibTeX | Tags: anomaly detection, CNN, deep learning, one-class learning, y-gan

@article{ESWA2024,

title = {Y-GAN: Learning Dual Data Representations for Anomaly Detection in Images},

author = {Marija Ivanovska and Vitomir Štruc},

url = {https://www.sciencedirect.com/science/article/pii/S0957417424002756

https://lmi.fe.uni-lj.si/wp-content/uploads/2024/02/YGAN_Marija.pdf},

doi = {https://doi.org/10.1016/j.eswa.2024.123410},

year  = {2024},

date = {2024-03-01},

urldate = {2024-03-01},

journal = {Expert Systems with Applications (ESWA)},

volume = {248},

number = {123410},

pages = {1-7},

abstract = {We propose a novel reconstruction-based model for anomaly detection in image data, called 'Y-GAN'. The model consists of a Y-shaped auto-encoder and represents images in two separate latent spaces. The first captures meaningful image semantics, which are key for representing (normal) training data, whereas the second encodes low-level residual image characteristics. To ensure the dual representations encode mutually exclusive information, a disentanglement procedure is designed around a latent (proxy) classifier. Additionally, a novel representation-consistency mechanism is proposed to prevent information leakage between the latent spaces. The model is trained in a one-class learning setting using only normal training data. Due to the separation of semantically-relevant and residual information, Y-GAN is able to derive informative data representations that allow for efficacious anomaly detection across a diverse set of anomaly detection tasks. The model is evaluated in comprehensive experiments with several recent anomaly detection models using four popular image datasets, i.e., MNIST, FMNIST, CIFAR10, and PlantVillage. Experimental results show that Y-GAN outperforms all tested models by a considerable margin and yields state-of-the-art results. The source code for the model is made publicly available at https://github.com/MIvanovska/Y-GAN. },

keywords = {anomaly detection, CNN, deep learning, one-class learning, y-gan},

pubstate = {published},

tppubtype = {article}

}

Close

Ivanovska, Marija; Štruc, Vitomir

On the Vulnerability of Deepfake Detectors to Attacks Generated by Denoising Diffusion Models Proceedings Article

In: Proceedings of WACV Workshops, pp. 1051-1060, 2024.

Abstract | Links | BibTeX | Tags: deep learning, deepfake, deepfake detection, diffusion models, face, media forensics

Pernuš, Martin; Štruc, Vitomir; Dobrišek, Simon

MaskFaceGAN: High Resolution Face Editing With Masked GAN Latent Code Optimization Journal Article

In: IEEE Transactions on Image Processing, 2023, ISSN: 1941-0042.

Abstract | Links | BibTeX | Tags: CNN, computer vision, deep learning, face editing, face image processing, GAN, GAN inversion, generative models, StyleGAN

@article{MaskFaceGAN,

title = {MaskFaceGAN: High Resolution Face Editing With Masked GAN Latent Code Optimization},

author = {Martin Pernuš and Vitomir Štruc and Simon Dobrišek},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10299582

https://lmi.fe.uni-lj.si/wp-content/uploads/2023/02/MaskFaceGAN_compressed.pdf

https://arxiv.org/pdf/2103.11135.pdf},

doi = {10.1109/TIP.2023.3326675},

issn = {1941-0042},

year  = {2023},

date = {2023-10-27},

urldate = {2023-01-02},

journal = {IEEE Transactions on Image Processing},

abstract = {Face editing represents a popular research topic within the computer vision and image processing communities. While significant progress has been made recently in this area, existing solutions: ( i ) are still largely focused on low-resolution images, ( ii ) often generate editing results with visual artefacts, or ( iii ) lack fine-grained control over the editing procedure and alter multiple (entangled) attributes simultaneously, when trying to generate the desired facial semantics. In this paper, we aim to address these issues through a novel editing approach, called MaskFaceGAN that focuses on local attribute editing. The proposed approach is based on an optimization procedure that directly optimizes the latent code of a pre-trained (state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with respect to several constraints that ensure: ( i ) preservation of relevant image content, ( ii ) generation of the targeted facial attributes, and ( iii ) spatially–selective treatment of local image regions. The constraints are enforced with the help of an (differentiable) attribute classifier and face parser that provide the necessary reference information for the optimization procedure. MaskFaceGAN is evaluated in extensive experiments on the FRGC, SiblingsDB-HQf, and XM2VTS datasets and in comparison with several state-of-the-art techniques from the literature. Our experimental results show that the proposed approach is able to edit face images with respect to several local facial attributes with unprecedented image quality and at high-resolutions (1024×1024), while exhibiting considerably less problems with attribute entanglement than competing solutions. The source code is publicly available from: https://github.com/MartinPernus/MaskFaceGAN.},

keywords = {CNN, computer vision, deep learning, face editing, face image processing, GAN, GAN inversion, generative models, StyleGAN},

pubstate = {published},

tppubtype = {article}

}

Close

Babnik, Žiga; Peer, Peter; Štruc, Vitomir

DifFIQA: Face Image Quality Assessment Using Denoising Diffusion Probabilistic Models Proceedings Article

In: IEEE International Joint Conference on Biometrics , pp. 1-10, IEEE, Ljubljana, Slovenia, 2023.

Abstract | Links | BibTeX | Tags: biometrics, deep learning, denoising diffusion probabilistic models, diffusion, face, face image quality assesment, face recognition, FIQA, quality

@inproceedings{Diffiqa_2023,

title = {DifFIQA: Face Image Quality Assessment Using Denoising Diffusion Probabilistic Models},

author = {Žiga Babnik and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/121.pdf

https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/121-supp.pdf},

year  = {2023},

date = {2023-09-01},

booktitle = {IEEE International Joint Conference on Biometrics },

pages = {1-10},

publisher = {IEEE},

address = {Ljubljana, Slovenia},

abstract = {Modern face recognition (FR) models excel in constrained

scenarios, but often suffer from decreased performance

when deployed in unconstrained (real-world) environments

due to uncertainties surrounding the quality

of the captured facial data. Face image quality assessment

(FIQA) techniques aim to mitigate these performance

degradations by providing FR models with sample-quality

predictions that can be used to reject low-quality samples

and reduce false match errors. However, despite steady improvements,

ensuring reliable quality estimates across facial

images with diverse characteristics remains challenging.

In this paper, we present a powerful new FIQA approach,

named DifFIQA, which relies on denoising diffusion

probabilistic models (DDPM) and ensures highly competitive

results. The main idea behind the approach is to utilize

the forward and backward processes of DDPMs to perturb

facial images and quantify the impact of these perturbations

on the corresponding image embeddings for quality

prediction. Because the diffusion-based perturbations are

computationally expensive, we also distill the knowledge

encoded in DifFIQA into a regression-based quality predictor,

called DifFIQA(R), that balances performance and

execution time. We evaluate both models in comprehensive

experiments on 7 diverse datasets, with 4 target FR models

and against 10 state-of-the-art FIQA techniques with

highly encouraging results. The source code is available

from: https://github.com/LSIbabnikz/DifFIQA.},

keywords = {biometrics, deep learning, denoising diffusion probabilistic models, diffusion, face, face image quality assesment, face recognition, FIQA, quality},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Kolf, Jan Niklas; Boutros, Fadi; Elliesen, Jurek; Theuerkauf, Markus; Damer, Naser; Alansari, Mohamad Y; Hay, Oussama Abdul; Alansari, Sara Yousif; Javed, Sajid; Werghi, Naoufel; Grm, Klemen; Struc, Vitomir; Alonso-Fernandez, Fernando; Hernandez-Diaz, Kevin; Bigun, Josef; George, Anjith; Ecabert, Christophe; Shahreza, Hatef Otroshi; Kotwal, Ketan; Marcel, Sébastien; Medvedev, Iurii; Bo, Jin; Nunes, Diogo; Hassanpour, Ahmad; Khatiwada, Pankaj; Toor, Aafan Ahmad; Yang, Bian

EFaR 2023: Efficient Face Recognition Competition Proceedings Article

In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-12, Ljubljana, Slovenia, 2023.

Abstract | Links | BibTeX | Tags: biometrics, deep learning, face, face recognition, lightweight models

@inproceedings{EFAR2023_2023,

title = {EFaR 2023: Efficient Face Recognition Competition},

author = {Jan Niklas Kolf and Fadi Boutros and Jurek Elliesen and Markus Theuerkauf and Naser Damer and Mohamad Y Alansari and Oussama Abdul Hay and Sara Yousif Alansari and Sajid Javed and Naoufel Werghi and Klemen Grm and Vitomir Struc and Fernando Alonso-Fernandez and Kevin Hernandez-Diaz and Josef Bigun and Anjith George and Christophe Ecabert and Hatef Otroshi Shahreza and Ketan Kotwal and Sébastien Marcel and Iurii Medvedev and Jin Bo and Diogo Nunes and Ahmad Hassanpour and Pankaj Khatiwada and Aafan Ahmad Toor and Bian Yang},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/CameraReady-231.pdf},

year  = {2023},

date = {2023-09-01},

booktitle = {IEEE International Joint Conference on Biometrics (IJCB 2023)},

pages = {1-12},

address = {Ljubljana, Slovenia},

abstract = {This paper presents the summary of the Efficient Face

Recognition Competition (EFaR) held at the 2023 International

Joint Conference on Biometrics (IJCB 2023). The

competition received 17 submissions from 6 different teams.

To drive further development of efficient face recognition

models, the submitted solutions are ranked based on a

weighted score of the achieved verification accuracies on a

diverse set of benchmarks, as well as the deployability given

by the number of floating-point operations and model size.

The evaluation of submissions is extended to bias, crossquality,

and large-scale recognition benchmarks. Overall,

the paper gives an overview of the achieved performance

values of the submitted solutions as well as a diverse set of

baselines. The submitted solutions use small, efficient network

architectures to reduce the computational cost, some

solutions apply model quantization. An outlook on possible

techniques that are underrepresented in current solutions is

given as well.},

keywords = {biometrics, deep learning, face, face recognition, lightweight models},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Das, Abhijit; Atreya, Saurabh K; Mukherjee, Aritra; Vitek, Matej; Li, Haiqing; Wang, Caiyong; Guangzhe, Zhao; Boutros, Fadi; Siebke, Patrick; Kolf, Jan Niklas; Damer, Naser; Sun, Ye; Hexin, Lu; Aobo, Fab; Sheng, You; Nathan, Sabari; Ramamoorthy, Suganya; S, Rampriya R; G, Geetanjali; Sihag, Prinaka; Nigam, Aditya; Peer, Peter; Pal, Umapada; Struc, Vitomir

Sclera Segmentation and Joint Recognition Benchmarking Competition: SSRBC 2023 Proceedings Article

In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-10, Ljubljana, Slovenia, 2023.

Abstract | Links | BibTeX | Tags: biometrics, competition IJCB, computer vision, deep learning, sclera, sclera segmentation

Emersic, Ziga; Ohki, Tetsushi; Akasaka, Muku; Arakawa, Takahiko; Maeda, Soshi; Okano, Masora; Sato, Yuya; George, Anjith; Marcel, Sébastien; Ganapathi, Iyyakutti Iyappan; Ali, Syed Sadaf; Javed, Sajid; Werghi, Naoufel; Işık, Selin Gök; Sarıtaş, Erdi; Ekenel, Hazim Kemal; Hudovernik, Valter; Kolf, Jan Niklas; Boutros, Fadi; Damer, Naser; Sharma, Geetanjali; Kamboj, Aman; Nigam, Aditya; Jain, Deepak Kumar; Cámara, Guillermo; Peer, Peter; Struc, Vitomir

The Unconstrained Ear Recognition Challenge 2023: Maximizing Performance and Minimizing Bias Proceedings Article

In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-10, Ljubljana, Slovenia, 2023.

Abstract | Links | BibTeX | Tags: biometrics, competition, computer vision, deep learning, ear, ear biometrics, UERC 2023

@inproceedings{UERC2023,

title = {The Unconstrained Ear Recognition Challenge 2023: Maximizing Performance and Minimizing Bias},

author = {Ziga Emersic and Tetsushi Ohki and Muku Akasaka and Takahiko Arakawa and Soshi Maeda and Masora Okano and Yuya Sato and Anjith George and Sébastien Marcel and Iyyakutti Iyappan Ganapathi and Syed Sadaf Ali and Sajid Javed and Naoufel Werghi and Selin Gök Işık and Erdi Sarıtaş and Hazim Kemal Ekenel and Valter Hudovernik and Jan Niklas Kolf and Fadi Boutros and Naser Damer and Geetanjali Sharma and Aman Kamboj and Aditya Nigam and Deepak Kumar Jain and Guillermo Cámara and Peter Peer and Vitomir Struc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/CameraReady-234.pdf},

year  = {2023},

date = {2023-09-01},

booktitle = {IEEE International Joint Conference on Biometrics (IJCB 2023)},

pages = {1-10},

address = {Ljubljana, Slovenia},

abstract = {The paper provides a summary of the 2023 Unconstrained

Ear Recognition Challenge (UERC), a benchmarking

effort focused on ear recognition from images acquired

in uncontrolled environments. The objective of the challenge

was to evaluate the effectiveness of current ear recognition

techniques on a challenging ear dataset while analyzing

the techniques from two distinct aspects, i.e., verification

performance and bias with respect to specific demographic

factors, i.e., gender and ethnicity. Seven research

groups participated in the challenge and submitted

a seven distinct recognition approaches that ranged from

descriptor-based methods and deep-learning models to ensemble

techniques that relied on multiple data representations

to maximize performance and minimize bias. A comprehensive

investigation into the performance of the submitted

models is presented, as well as an in-depth analysis of

bias and associated performance differentials due to differences

in gender and ethnicity. The results of the challenge

suggest that a wide variety of models (e.g., transformers,

convolutional neural networks, ensemble models) is capable

of achieving competitive recognition results, but also

that all of the models still exhibit considerable performance

differentials with respect to both gender and ethnicity. To

promote further development of unbiased and effective ear

recognition models, the starter kit of UERC 2023 together

with the baseline model, and training and test data is made

available from: http://ears.fri.uni-lj.si/.},

keywords = {biometrics, competition, computer vision, deep learning, ear, ear biometrics, UERC 2023},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Ivanovska, Marija; Štruc, Vitomir; Perš, Janez

TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models Proceedings Article

In: 18th International Conference on Machine Vision and Applications (MVA 2023), pp. 1-6, 2023.

Abstract | Links | BibTeX | Tags: agriculture, dataset, deep learning, diffusion, plan segmentation, plant monitoring, robotics, segmentation, tomato dataset

Vitek, Matej; Bizjak, Matic; Peer, Peter; Štruc, Vitomir

IPAD: Iterative Pruning with Activation Deviation for Sclera Biometrics Journal Article

In: Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 8, pp. 1-21, 2023.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, model compression, pruning, sclera, sclera segmentation

@article{VitekSaud2023,

title = {IPAD: Iterative Pruning with Activation Deviation for Sclera Biometrics},

author = {Matej Vitek and Matic Bizjak and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/07/PublishedVersion.pdf},

doi = {https://doi.org/10.1016/j.jksuci.2023.101630},

year  = {2023},

date = {2023-07-10},

journal = {Journal of King Saud University - Computer and Information Sciences},

volume = {35},

number = {8},

pages = {1-21},

abstract = {The sclera has recently been gaining attention as a biometric modality due to its various desirable characteristics. A key step in any type of ocular biometric recognition, including sclera recognition, is the segmentation of the relevant part(s) of the eye. However, the high computational complexity of the (deep) segmentation models used in this task can limit their applicability on resource-constrained devices such as smartphones or head-mounted displays. As these devices are a common desired target for such biometric systems, lightweight solutions for ocular segmentation are critically needed. To address this issue, this paper introduces IPAD (Iterative Pruning with Activation Deviation), a novel method for developing lightweight convolutional networks, that is based on model pruning. IPAD uses a novel filter-activation-based criterion (ADC) to determine low-importance filters and employs an iterative model pruning procedure to derive the final lightweight model. To evaluate the proposed pruning procedure, we conduct extensive experiments with two diverse segmentation models, over four publicly available datasets (SBVPI, SLD, SMD and MOBIUS), in four distinct problem configurations and in comparison to state-of-the-art methods from the literature. The results of the experiments show that the proposed filter-importance criterion outperforms the standard L1 and L2 approaches from the literature. Furthermore, the results also suggest that: 1) the pruned models are able to retain (or even improve on) the performance of the unpruned originals, as long as they are not over-pruned, with RITnet and U-Net at 50% of their original FLOPs reaching up to 4% and 7% higher IoU values than their unpruned versions, respectively, 2) smaller models require more careful pruning, as the pruning process can hurt the model’s generalization capabilities, and 3) the novel criterion most convincingly outperforms the classic approaches when sufficient training data is available, implying that the abundance of data leads to more robust activation-based importance computation.},

keywords = {biometrics, CNN, deep learning, model compression, pruning, sclera, sclera segmentation},

pubstate = {published},

tppubtype = {article}

}

Close

The sclera has recently been gaining attention as a biometric modality due to its various desirable characteristics. A key step in any type of ocular biometric recognition, including sclera recognition, is the segmentation of the relevant part(s) of the eye. However, the high computational complexity of the (deep) segmentation models used in this task can limit their applicability on resource-constrained devices such as smartphones or head-mounted displays. As these devices are a common desired target for such biometric systems, lightweight solutions for ocular segmentation are critically needed. To address this issue, this paper introduces IPAD (Iterative Pruning with Activation Deviation), a novel method for developing lightweight convolutional networks, that is based on model pruning. IPAD uses a novel filter-activation-based criterion (ADC) to determine low-importance filters and employs an iterative model pruning procedure to derive the final lightweight model. To evaluate the proposed pruning procedure, we conduct extensive experiments with two diverse segmentation models, over four publicly available datasets (SBVPI, SLD, SMD and MOBIUS), in four distinct problem configurations and in comparison to state-of-the-art methods from the literature. The results of the experiments show that the proposed filter-importance criterion outperforms the standard L1 and L2 approaches from the literature. Furthermore, the results also suggest that: 1) the pruned models are able to retain (or even improve on) the performance of the unpruned originals, as long as they are not over-pruned, with RITnet and U-Net at 50% of their original FLOPs reaching up to 4% and 7% higher IoU values than their unpruned versions, respectively, 2) smaller models require more careful pruning, as the pruning process can hurt the model’s generalization capabilities, and 3) the novel criterion most convincingly outperforms the classic approaches when sufficient training data is available, implying that the abundance of data leads to more robust activation-based importance computation.

Close

Pernuš, Martin; Bhatnagar, Mansi; Samad, Badr; Singh, Divyanshu; Peer, Peter; Štruc, Vitomir; Dobrišek, Simon

ChildNet: Structural Kinship Face Synthesis Model With Appearance Control Mechanisms Journal Article

In: IEEE Access, pp. 1-22, 2023, ISSN: 2169-3536.

Abstract | Links | BibTeX | Tags: artificial intelligence, CNN, deep learning, face generation, face synthesis, GAN, GAN inversion, kinship, kinship synthesis, StyleGAN2

@article{AccessMartin2023,

title = {ChildNet: Structural Kinship Face Synthesis Model With Appearance Control Mechanisms},

author = {Martin Pernuš and Mansi Bhatnagar and Badr Samad and Divyanshu Singh and Peter Peer and Vitomir Štruc and Simon Dobrišek},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10126110},

doi = {10.1109/ACCESS.2023.3276877},

issn = {2169-3536},

year  = {2023},

date = {2023-05-17},

journal = {IEEE Access},

pages = {1-22},

abstract = {Kinship face synthesis is an increasingly popular topic within the computer vision community, particularly the task of predicting the child appearance using parental images. Previous work has been limited in terms of model capacity and inadequate training data, which is comprised of low-resolution and tightly cropped images, leading to lower synthesis quality. In this paper, we propose ChildNet,  a method for kinship face synthesis that leverages the facial image generation capabilities of a state-of-the-art Generative Adversarial Network (GAN), and resolves the aforementioned problems. ChildNet is designed within the GAN latent space and is able to predict a child appearance that bears high resemblance to real parents’ children. To ensure fine-grained control, we propose an age and gender manipulation module that allows precise manipulation of the child synthesis result. ChildNet is capable of generating multiple child images per parent pair input, while providing a way to control the image generation variability. Additionally, we introduce a mechanism to control the dominant parent image. Finally, to facilitate the task of kinship face synthesis, we introduce a new kinship dataset, called Next of Kin. This dataset contains 3690 high-resolution face images with a diverse range of ethnicities and ages. We evaluate ChildNet in comprehensive experiments against three competing kinship face synthesis models, using two kinship datasets. The experiments demonstrate the superior performance of ChildNet in terms of identity similarity, while exhibiting high perceptual image quality. The source code for the model is publicly available at: https://github.com/MartinPernus/ChildNet.},

keywords = {artificial intelligence, CNN, deep learning, face generation, face synthesis, GAN, GAN inversion, kinship, kinship synthesis, StyleGAN2},

pubstate = {published},

tppubtype = {article}

}

Close

Grabner, Miha; Wang, Yi; Wen, Qingsong; Blažič, Boštjan; Štruc, Vitomir

A global modeling framework for load forecasting in distribution networks Journal Article

In: IEEE Transactions on Smart Grid, 2023, ISSN: 1949-3061.

Abstract | Links | BibTeX | Tags: deep learning, global modeling, load forecasting, prediction, smart grid, time series analysis, time series forecasting

@article{Grabner_TSG,

title = {A global modeling framework for load forecasting in distribution networks},

author = {Miha Grabner and Yi Wang and Qingsong Wen and Boštjan Blažič and Vitomir Štruc},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10092804},

doi = {10.1109/TSG.2023.3264525},

issn = {1949-3061},

year  = {2023},

date = {2023-04-05},

journal = {IEEE Transactions on Smart Grid},

abstract = {With the increasing numbers of smart meter installations, scalable and efficient load forecasting techniques are critically needed to ensure sustainable situation awareness within the distribution networks. Distribution networks include a large amount of different loads at various aggregation levels, such as individual consumers, low-voltage feeders, and transformer stations. It is impractical to develop individual (or so-called local) forecasting models for each load separately. Additionally, such local models also (i) (largely) ignore the strong dependencies between different loads that might be present due to their spatial proximity and the characteristics of the distribution network, (ii) require historical data for each load to be able to make forecasts, and (iii) are incapable of adjusting to changes in the load behavior without retraining. To address these issues, we propose a global modeling framework for load forecasting in distribution networks that, unlike its local competitors, relies on a single global model to generate forecasts for a large number of loads. The global nature of the framework, significantly reduces the computational burden typically required when training multiple local forecasting models, efficiently exploits the cross-series information shared among different loads, and facilitates forecasts even when historical data for a load is missing or the behavior of a load evolves over time. To further improve on the performance of the proposed framework, an unsupervised localization mechanism and optimal ensemble construction strategy are also proposed to localize/personalize the global forecasting model to different load characteristics. Our experimental results show that the proposed framework outperforms naive benchmarks by more than 25% (in terms of Mean Absolute Error) on real-world dataset while exhibiting highly desirable characteristics when compared to the local models that are predominantly used in the literature. All source code and data are made publicly available to enable reproducibility: https://github.com/mihagrabner/GlobalModelingFramework},

keywords = {deep learning, global modeling, load forecasting, prediction, smart grid, time series analysis, time series forecasting},

pubstate = {published},

tppubtype = {article}

}

Close

With the increasing numbers of smart meter installations, scalable and efficient load forecasting techniques are critically needed to ensure sustainable situation awareness within the distribution networks. Distribution networks include a large amount of different loads at various aggregation levels, such as individual consumers, low-voltage feeders, and transformer stations. It is impractical to develop individual (or so-called local) forecasting models for each load separately. Additionally, such local models also (i) (largely) ignore the strong dependencies between different loads that might be present due to their spatial proximity and the characteristics of the distribution network, (ii) require historical data for each load to be able to make forecasts, and (iii) are incapable of adjusting to changes in the load behavior without retraining. To address these issues, we propose a global modeling framework for load forecasting in distribution networks that, unlike its local competitors, relies on a single global model to generate forecasts for a large number of loads. The global nature of the framework, significantly reduces the computational burden typically required when training multiple local forecasting models, efficiently exploits the cross-series information shared among different loads, and facilitates forecasts even when historical data for a load is missing or the behavior of a load evolves over time. To further improve on the performance of the proposed framework, an unsupervised localization mechanism and optimal ensemble construction strategy are also proposed to localize/personalize the global forecasting model to different load characteristics. Our experimental results show that the proposed framework outperforms naive benchmarks by more than 25% (in terms of Mean Absolute Error) on real-world dataset while exhibiting highly desirable characteristics when compared to the local models that are predominantly used in the literature. All source code and data are made publicly available to enable reproducibility: https://github.com/mihagrabner/GlobalModelingFramework

Close

Meden, Blaž; Gonzalez-Hernandez, Manfred; Peer, Peter; Štruc, Vitomir

Face deidentification with controllable privacy protection Journal Article

In: Image and Vision Computing, vol. 134, no. 104678, pp. 1-19, 2023.

Abstract | Links | BibTeX | Tags: CNN, deep learning, deidentification, face recognition, GAN, GAN inversion, privacy, privacy protection, StyleGAN2

@article{MedenDeID2023,

title = {Face deidentification with controllable privacy protection},

author = {Blaž Meden and Manfred Gonzalez-Hernandez and Peter Peer and Vitomir Štruc},

url = {https://reader.elsevier.com/reader/sd/pii/S0262885623000525?token=BC1E21411C50118E666720B002A89C9EB3DB4CFEEB5EB18D7BD7B0613085030A96621C8364583BFE7BAE025BE3646096&originRegion=eu-west-1&originCreation=20230516115322},

doi = {https://doi.org/10.1016/j.imavis.2023.104678},

year  = {2023},

date = {2023-04-01},

journal = {Image and Vision Computing},

volume = {134},

number = {104678},

pages = {1-19},

abstract = {Privacy protection has become a crucial concern in today’s digital age. Particularly sensitive here are facial images, which typically not only reveal a person’s identity, but also other sensitive personal information. To address this problem, various face deidentification techniques have been presented in the literature. These techniques try to remove or obscure personal information from facial images while still preserving their usefulness for further analysis. While a considerable amount of work has been proposed on face deidentification, most state-of-theart solutions still suffer from various drawbacks, and (a) deidentify only a narrow facial area, leaving potentially important contextual information unprotected, (b) modify facial images to such degrees, that image naturalness and facial diversity is suffering in the deidentify images, (c) offer no flexibility in the level of privacy protection ensured, leading to suboptimal deployment in various applications, and (d) often offer an unsatisfactory tradeoff between the ability to obscure identity information, quality and naturalness of the deidentified images, and sufficient utility preservation. In this paper, we address these shortcomings with a novel controllable face deidentification technique that balances image quality, identity protection, and data utility for further analysis. The proposed approach utilizes a powerful generative model (StyleGAN2), multiple auxiliary classification models, and carefully designed constraints to guide the deidentification process. The approach is validated across four diverse datasets (CelebA-HQ, RaFD, XM2VTS, AffectNet) and in comparison to 7 state-of-the-art competitors. The results of the experiments demonstrate that the proposed solution leads to: (a) a considerable level of identity protection, (b) valuable preservation of data utility, (c) sufficient diversity among the deidentified faces, and (d) encouraging overall performance.},

keywords = {CNN, deep learning, deidentification, face recognition, GAN, GAN inversion, privacy, privacy protection, StyleGAN2},

pubstate = {published},

tppubtype = {article}

}

Close

Ivanovska, Marija; Štruc, Vitomir

Face Morphing Attack Detection with Denoising Diffusion Probabilistic Models Proceedings Article

In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), pp. 1-6, 2023.

Abstract | Links | BibTeX | Tags: biometrics, deep learning, denoising diffusion probabilistic models, diffusion, face, face morphing attack, morphing attack, morphing attack detection

Grm, Klemen; Ozata, Berk; Struc, Vitomir; Ekenel, Hazim K.

Meet-in-the-middle: Multi-scale upsampling and matching for cross-resolution face recognition Proceedings Article

In: WACV workshops, pp. 120-129, 2023.

Abstract | Links | BibTeX | Tags: deep learning, face, face recognition, multi-scale matching, smart surveillance, surveillance, surveillance technology

Hrovatič, Anja; Peer, Peter; Štruc, Vitomir; Emeršič, Žiga

Efficient ear alignment using a two-stack hourglass network Journal Article

In: IET Biometrics , pp. 1-14, 2023, ISSN: 2047-4938.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, ear, ear alignment, ear recognition

@article{UhljiIETZiga,

title = {Efficient ear alignment using a two-stack hourglass network},

author = {Anja Hrovatič and Peter Peer and Vitomir Štruc and Žiga Emeršič},

url = {https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2.12109},

doi = {10.1049/bme2.12109},

issn = {2047-4938},

year  = {2023},

date = {2023-01-01},

journal = {IET Biometrics },

pages = {1-14},

abstract = {Ear images have been shown to be a reliable modality for biometric recognition with desirable characteristics, such as high universality, distinctiveness, measurability and permanence. While a considerable amount of research has been directed towards ear recognition techniques, the problem of ear alignment is still under-explored in the open literature. Nonetheless, accurate alignment of ear images, especially in unconstrained acquisition scenarios, where the ear appearance is expected to vary widely due to pose and view point variations, is critical for the performance of all downstream tasks, including ear recognition. Here, the authors address this problem and present a framework for ear alignment that relies on a two-step procedure: (i) automatic landmark detection and (ii) fiducial point alignment. For the first (landmark detection) step, the authors implement and train a Two-Stack Hourglass model (2-SHGNet) capable of accurately predicting 55 landmarks on diverse ear images captured in uncontrolled conditions. For the second (alignment) step, the authors use the Random Sample Consensus (RANSAC) algorithm to align the estimated landmark/fiducial points with a pre-defined ear shape (i.e. a collection of average ear landmark positions). The authors evaluate the proposed framework in comprehensive experiments on the AWEx and ITWE datasets and show that the 2-SHGNet model leads to more accurate landmark predictions than competing state-of-the-art models from the literature. Furthermore, the authors also demonstrate that the alignment step significantly improves recognition accuracy with ear images from unconstrained environments compared to unaligned imagery.},

keywords = {biometrics, CNN, deep learning, ear, ear alignment, ear recognition},

pubstate = {published},

tppubtype = {article}

}

Close

Gan, Chenquan; Yang, Yucheng; Zhub, Qingyi; Jain, Deepak Kumar; Struc, Vitomir

DHF-Net: A hierarchical feature interactive fusion network for dialogue emotion recognition Journal Article

In: Expert Systems with Applications, vol. 210, 2022.

Abstract | Links | BibTeX | Tags: attention, CNN, deep learning, dialogue, emotion recognition, fusion, fusion network, nlp, semantics, text, text processing

Tomašević, Darian; Peer, Peter; Štruc, Vitomir

BiOcularGAN: Bimodal Synthesis and Annotation of Ocular Images Proceedings Article

In: IEEE/IAPR International Joint Conference on Biometrics (IJCB 2022) , pp. 1-10, 2022.

Abstract | Links | BibTeX | Tags: biometrics, CNN, data synthesis, deep learning, ocular, segmentation, StyleGAN, synthetic data

Huber, Marco; Boutros, Fadi; Luu, Anh Thi; Raja, Kiran; Ramachandra, Raghavendra; Damer, Naser; Neto, Pedro C.; Goncalves, Tiago; Sequeira, Ana F.; Cardoso, Jaime S.; Tremoco, João; Lourenco, Miguel; Serra, Sergio; Cermeno, Eduardo; Ivanovska, Marija; Batagelj, Borut; Kronovšek, Andrej; Peer, Peter; Štruc, Vitomir

SYN-MAD 2022: Competition on Face Morphing Attack Detection based on Privacy-aware Synthetic Training Data Proceedings Article

In: IEEE International Joint Conference on Biometrics (IJCB), pp. 1-10, 2022, ISBN: 978-1-6654-6394-2.

Links | BibTeX | Tags: data synthesis, deep learning, face, face PAD, pad, synthetic data

Šircelj, Jaka; Peer, Peter; Solina, Franc; Štruc, Vitomir

Hierarchical Superquadric Decomposition with Implicit Space Separation Proceedings Article

In: Proceedings of ERK 2022, pp. 1-4, 2022.

Abstract | Links | BibTeX | Tags: CNN, deep learning, depth estimation, iterative procedure, model fitting, recursive model, superquadric, superquadrics, volumetric primitive

Dvoršak, Grega; Dwivedi, Ankita; Štruc, Vitomir; Peer, Peter; Emeršič, Žiga

Kinship Verification from Ear Images: An Explorative Study with Deep Learning Models Proceedings Article

In: International Workshop on Biometrics and Forensics (IWBF), pp. 1–6, 2022.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, ear, ear biometrics, kinear, kinship, kinship recognition, transformer

Jug, Julijan; Lampe, Ajda; Štruc, Vitomir; Peer, Peter

Body Segmentation Using Multi-task Learning Proceedings Article

In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, 2022, ISBN: 978-1-6654-5818-4.

Abstract | Links | BibTeX | Tags: body segmentation, cn, CNN, computer vision, deep beauty, deep learning, multi-task learning, segmentation, virtual try-on

@inproceedings{JulijanJugBody,

title = {Body Segmentation Using Multi-task Learning},

author = {Julijan Jug and Ajda Lampe and Vitomir Štruc and Peter Peer},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/03/ICAIIC_paper.pdf},

doi = {10.1109/ICAIIC54071.2022.9722662},

isbn = {978-1-6654-5818-4},

year  = {2022},

date = {2022-01-20},

urldate = {2022-01-20},

booktitle = {International Conference on Artificial Intelligence in Information and Communication (ICAIIC)},

publisher = {IEEE},

abstract = {Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks.  Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance.  Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a  better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the  model is analysed through rigorous experiments on the  LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance. },

keywords = {body segmentation, cn, CNN, computer vision, deep beauty, deep learning, multi-task learning, segmentation, virtual try-on},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Grm, Klemen; Vitomir, Štruc

Frequency Band Encoding for Face Super-Resolution Proceedings Article

In: Proceedings of ERK 2021, pp. 1-4, 2021.

Abstract | Links | BibTeX | Tags: CNN, deep learning, face, face hallucination, frequency encoding, super-resolution

Batagelj, Borut; Peer, Peter; Štruc, Vitomir; Dobrišek, Simon

How to correctly detect face-masks for COVID-19 from visual information? Journal Article

In: Applied sciences, vol. 11, no. 5, pp. 1-24, 2021, ISBN: 2076-3417.

Abstract | Links | BibTeX | Tags: computer vision, COVID-19, deep learning, detection, face, mask detection, recognition

@article{Batagelj2021,

title = {How to correctly detect face-masks for COVID-19 from visual information?},

author = {Borut Batagelj and Peter Peer and Vitomir Štruc and Simon Dobrišek},

url = {https://www.mdpi.com/2076-3417/11/5/2070/pdf},

doi = {10.3390/app11052070},

isbn = {2076-3417},

year  = {2021},

date = {2021-03-01},

urldate = {2021-03-01},

journal = {Applied sciences},

volume = {11},

number = {5},

pages = {1-24},

abstract = {The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). To limit the spread of the disease, mandatory face-mask rules are now becoming common in public settings around the world. Additionally, many public service providers require customers to wear face-masks in accordance with predefined rules (e.g., covering both mouth and nose) when using public services. These developments inspired research into automatic (computer-vision-based) techniques for face-mask detection that can help monitor public behavior and contribute towards constraining the COVID-19 pandemic. Although existing research in this area resulted in efficient techniques for face-mask detection, these usually operate under the assumption that modern face detectors provide perfect detection performance (even for masked faces) and that the main goal of the techniques is to detect the presence of face-masks only. In this study, we revisit these common assumptions and explore the following research questions: (i) How well do existing face detectors perform with masked-face images? (ii) Is it possible to detect a proper (regulation-compliant) placement of facial masks? and (iii) How useful are existing face-mask detection techniques for monitoring applications during the COVID-19 pandemic? To answer these and related questions we conduct a comprehensive experimental evaluation of several recent face detectors for their performance with masked-face images. Furthermore, we investigate the usefulness of multiple off-the-shelf deep-learning models for recognizing correct face-mask placement. Finally, we design a complete pipeline for recognizing whether face-masks are worn correctly or not and compare the performance of the pipeline with standard face-mask detection models from the literature. To facilitate the study, we compile a large dataset of facial images from the publicly available MAFA and Wider Face datasets and annotate it with compliant and non-compliant labels. The annotation dataset, called Face-Mask-Label Dataset (FMLD), is made publicly available to the research community.},

keywords = {computer vision, COVID-19, deep learning, detection, face, mask detection, recognition},

pubstate = {published},

tppubtype = {article}

}

Close

The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). To limit the spread of the disease, mandatory face-mask rules are now becoming common in public settings around the world. Additionally, many public service providers require customers to wear face-masks in accordance with predefined rules (e.g., covering both mouth and nose) when using public services. These developments inspired research into automatic (computer-vision-based) techniques for face-mask detection that can help monitor public behavior and contribute towards constraining the COVID-19 pandemic. Although existing research in this area resulted in efficient techniques for face-mask detection, these usually operate under the assumption that modern face detectors provide perfect detection performance (even for masked faces) and that the main goal of the techniques is to detect the presence of face-masks only. In this study, we revisit these common assumptions and explore the following research questions: (i) How well do existing face detectors perform with masked-face images? (ii) Is it possible to detect a proper (regulation-compliant) placement of facial masks? and (iii) How useful are existing face-mask detection techniques for monitoring applications during the COVID-19 pandemic? To answer these and related questions we conduct a comprehensive experimental evaluation of several recent face detectors for their performance with masked-face images. Furthermore, we investigate the usefulness of multiple off-the-shelf deep-learning models for recognizing correct face-mask placement. Finally, we design a complete pipeline for recognizing whether face-masks are worn correctly or not and compare the performance of the pipeline with standard face-mask detection models from the literature. To facilitate the study, we compile a large dataset of facial images from the publicly available MAFA and Wider Face datasets and annotate it with compliant and non-compliant labels. The annotation dataset, called Face-Mask-Label Dataset (FMLD), is made publicly available to the research community.

Close

Stepec, Dejan; Emersic, Ziga; Peer, Peter; Struc, Vitomir

Constellation-Based Deep Ear Recognition Book Section

In: Jiang, R.; Li, CT.; Crookes, D.; Meng, W.; Rosenberger, C. (Ed.): Deep Biometrics: Unsupervised and Semi-Supervised Learning, Springer, 2020, ISBN: 978-3-030-32582-4.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, ear recognition, neural networks

Grm, Klemen; Scheirer, Walter J.; Štruc, Vitomir

Face hallucination using cascaded super-resolution and identity priors Journal Article

In: IEEE Transactions on Image Processing, 2020.

Abstract | Links | BibTeX | Tags: biometrics, CNN, computer vision, deep learning, face, face hallucination, super-resolution

@article{TIPKlemen_2020,

title = {Face hallucination using cascaded super-resolution and identity priors},

author = {Klemen Grm and Walter J. Scheirer and Vitomir Štruc},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8866753

https://lmi.fe.uni-lj.si/wp-content/uploads/2023/02/IEEET_face_hallucination_compressed.pdf},

doi = {10.1109/TIP.2019.2945835},

year  = {2020},

date = {2020-01-01},

urldate = {2020-01-01},

journal = {IEEE Transactions on Image Processing},

abstract = {In this paper we address the problem of hallucinating high-resolution facial images from low-resolution inputs at high magnification factors. We approach this task with convolutional neural networks (CNNs) and propose a novel (deep) face hallucination model that incorporates identity priors into the learning procedure. The model consists of two main parts: i) a cascaded super-resolution network that upscales the lowresolution facial images, and ii) an ensemble of face recognition models that act as identity priors for the super-resolution network during training. Different from most competing super-resolution techniques that rely on a single model for upscaling (even with large magnification factors), our network uses a cascade of multiple SR models that progressively upscale the low-resolution images using steps of 2×. This characteristic allows us to apply supervision signals (target appearances) at different resolutions and incorporate identity constraints at multiple-scales. The proposed C-SRIP model (Cascaded Super Resolution with Identity Priors) is able to upscale (tiny) low-resolution images captured in unconstrained conditions and produce visually convincing results for diverse low-resolution inputs. We rigorously evaluate the proposed model on the Labeled Faces in the Wild (LFW), Helen and CelebA datasets and report superior performance compared to the existing state-of-the-art.

},

keywords = {biometrics, CNN, computer vision, deep learning, face, face hallucination, super-resolution},

pubstate = {published},

tppubtype = {article}

}

Close

Rot, Peter; Vitek, Matej; Grm, Klemen; Emeršič, Žiga; Peer, Peter; Štruc, Vitomir

Deep Sclera Segmentation and Recognition Book Section

In: Uhl, Andreas; Busch, Christoph; Marcel, Sebastien; Veldhuis, Rainer (Ed.): Handbook of Vascular Biometrics, pp. 395-432, Springer, 2019, ISBN: 978-3-030-27731-4.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, ocular, sclera, segmentation, vasculature

@incollection{ScleraNetChapter,

title = {Deep Sclera Segmentation and Recognition},

author = {Peter Rot and Matej Vitek and Klemen Grm and Žiga Emeršič and Peter Peer

and Vitomir Štruc},

editor = {Andreas Uhl and Christoph Busch and Sebastien Marcel and Rainer Veldhuis},

url = {https://link.springer.com/content/pdf/10.1007%2F978-3-030-27731-4_13.pdf},

doi = {https://doi.org/10.1007/978-3-030-27731-4_13},

isbn = {978-3-030-27731-4},

year  = {2019},

date = {2019-11-14},

booktitle = {Handbook of Vascular Biometrics},

pages = {395-432},

publisher = {Springer},

chapter = {13},

series = {Advances in Computer Vision and Pattern Recognition},

abstract = {In this chapter, we address the problem of biometric identity recognition from the vasculature of the human sclera. Specifically, we focus on the challenging task of multi-view sclera recognition, where the visible part of the sclera vasculature changes from image to image due to varying gaze (or view) directions. We propose a complete solution for this task built around Convolutional Neural Networks (CNNs) and make several contributions that result in state-of-the-art recognition performance, i.e.: (i) we develop a cascaded CNN assembly that is able to robustly segment the sclera vasculature from the input images regardless of gaze direction, and (ii) we present ScleraNET, a CNN model trained in a multi-task manner (combining losses pertaining to identity and view-direction recognition) that allows for the extraction of discriminative vasculature descriptors that can be used for identity inference. To evaluate the proposed contributions, we also introduce a new dataset of ocular images, called the Sclera Blood Vessels, Periocular and Iris (SBVPI) dataset, which represents one of the few publicly available datasets suitable for research in multi-view sclera segmentation and recognition. The datasets come with a rich set of annotations, such as a per-pixel markup of various eye parts (including the sclera vasculature), identity, gaze-direction and gender labels. We conduct rigorous experiments on SBVPI with competing techniques from the literature and show that the combination of the proposed segmentation and descriptor-computation models results in highly competitive recognition performance.},

keywords = {biometrics, CNN, deep learning, ocular, sclera, segmentation, vasculature},

pubstate = {published},

tppubtype = {incollection}

}

Close

Stržinar, Žiga; Grm, Klemen; Štruc, Vitomir

Učenje podobnosti v globokih nevronskih omrežjih za razpoznavanje obrazov Proceedings Article

In: Proceedings of the Electrotechnical and Computer Science Conference (ERK), Portorož, Slovenia, 2016.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, difference space, face verification, LFW, performance evaluation

Grm, Klemen; Dobrišek, Simon; Štruc, Vitomir

Deep pair-wise similarity learning for face recognition Proceedings Article

In: 4th International Workshop on Biometrics and Forensics (IWBF), pp. 1–6, IEEE 2016.

Abstract | Links | BibTeX | Tags: CNN, deep learning, face recognition, IJB-A, IWBF, performance evaluation, similarity learning

@inproceedings{grm2016deep,

title = {Deep pair-wise similarity learning for face recognition},

author = {Klemen Grm and Simon Dobrišek and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/en/deeppair-wisesimilaritylearningforfacerecognition/},

year  = {2016},

date = {2016-01-01},

urldate = {2016-01-01},

booktitle = {4th International Workshop on Biometrics and Forensics (IWBF)},

pages = {1--6},

organization = {IEEE},

abstract = {Recent advances in deep learning made it possible to build deep hierarchical models capable of delivering state-of-the-art performance in various vision tasks, such as object recognition, detection or tracking. For recognition tasks the most common approach when using deep models is to learn object representations (or features) directly from raw image-input and then feed the learned features to a suitable classifier. Deep models used in this pipeline are typically heavily parameterized and require enormous amounts of training data to deliver competitive recognition performance. Despite the use of data augmentation techniques, many application domains, predefined experimental protocols or specifics of the recognition problem limit the amount of available training data and make training an effective deep hierarchical model a difficult task. In this paper, we present a novel, deep pair-wise similarity learning (DPSL) strategy for deep models, developed specifically to overcome the problem of insufficient training data, and demonstrate its usage on the task of face recognition. Unlike existing (deep) learning strategies, DPSL operates on image-pairs and tries to learn pair-wise image similarities that can be used for recognition purposes directly instead of feature representations that need to be fed to appropriate classification techniques, as with traditional deep learning pipelines. Since our DPSL strategy assumes an image pair as the input to the learning procedure, the amount of training data available to train deep models is quadratic in the number of available training images, which is of paramount importance for models with a large number of parameters. We demonstrate the efficacy of the proposed learning strategy by developing a deep model for pose-invariant face recognition, called Pose-Invariant Similarity Index (PISI), and presenting comparative experimental results on the FERET an IJB-A datasets.},

keywords = {CNN, deep learning, face recognition, IJB-A, IWBF, performance evaluation, similarity learning},

pubstate = {published},

tppubtype = {inproceedings}

}

Close