Publications – Laboratory for Machine Intelligence

Soltandoost, Elahe; Plesh, Richard; Schuckers, Stephanie; Peer, Peter; Struc, Vitomir

Extracting Local Information from Global Representations for Interpretable Deepfake Detection Proceedings Article

In: Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025, pp. 1-11, Tucson, USA, 2025.

Abstract | Links | BibTeX | Tags: CNN, deepfake DAD, deepfakes, faceforensics++, media forensics, xai

@inproceedings{Elahe_WACV2025,

title = {Extracting Local Information from Global Representations for Interpretable Deepfake Detection},

author = {Elahe Soltandoost and Richard Plesh and Stephanie Schuckers and Peter Peer and Vitomir Struc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2025/01/ElahePaperF.pdf},

year  = {2025},

date = {2025-03-01},

booktitle = {Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025},

pages = {1-11},

address = {Tucson, USA},

abstract = {The detection of deepfakes has become increasingly challenging due to the sophistication of manipulation techniques that produce highly convincing fake videos. Traditional detection methods often lack transparency and provide limited insight into their decision-making processes. To address these challenges, we propose in this paper a Locally-Explainable Self-Blended (LESB) DeepFake detector that in addition to the final fake-vs-real classification decision also provides information, on which local facial region (i.e., eyes, mouth or nose) contributed the most to the decision process.~At the heart of the detector is a novel Local Feature Discovery (LFD) technique that can be applied to the embedding space of pretrained DeepFake detectors and allows identifying embedding space directions that encode variations in the appearance of local facial features. We demonstrate the merits of the proposed LFD technique and LESB detector in comprehensive experiments on four popular datasets, i.e.,  Celeb-DF, DeepFake Detection Challenge, Face Forensics in the Wild and FaceForensics++, and show that the proposed detector is not only competitive in comparison to strong baselines, but also exhibits enhanced transparency in the decision-making process by providing insights on the contribution of local face parts in the final detection decision. },

keywords = {CNN, deepfake DAD, deepfakes, faceforensics++, media forensics, xai},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Grm, Klemen; Ozata, Berk Kemal; Kantarci, Alperen; Struc, Vitomir; Ekenel, Hazim Kemal

Degrade or super-resolve to recognize? Bridging the Domain Gap for Cross-Resolution Face Recognition Journal Article

In: IEEE Access, pp. 1-16, 2025, ISSN: 2169-3536.

Abstract | Links | BibTeX | Tags: CNN, face recognition, low quality, super-resolution

@article{GrmAccess2025,

title = {Degrade or super-resolve to recognize? Bridging the Domain Gap for Cross-Resolution Face Recognition},

author = {Klemen Grm and Berk Kemal Ozata and Alperen Kantarci and Vitomir Struc and Hazim Kemal Ekenel},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10833634

https://lmi.fe.uni-lj.si/wp-content/uploads/2025/01/Degrade_or_super-resolve_to_recognize_Bridging_the_Domain_Gap_for_Cross-Resolution_Face_Recognition_compressed-1.pdf},

doi = {10.1109/ACCESS.2025.3527236},

issn = {2169-3536},

year  = {2025},

date = {2025-01-08},

journal = {IEEE Access},

pages = {1-16},

abstract = {In this work, we address the problem of cross-resolution face recognition, where a low-resolution probe face is compared against high-resolution gallery faces. To address this challenging problem, we investigate two approaches for bridging the quality gap between low-quality probe faces and high-quality gallery faces. The first approach focuses on degrading the quality of high-resolution gallery images to bring them closer to the quality of the probe images. The second approach involves enhancing the resolution of the probe images using face hallucination. Our experiments on the SCFace and DroneSURF datasets reveal that the success of face hallucination is highly dependent on the quality of the original images, since poor image quality can severely limit the effectiveness of the hallucination technique. Therefore, the selection of the appropriate face recognition method should consider the quality of the images. Additionally, our experiments also suggest that combining gallery degradation and face hallucination in a hybrid recognition scheme provides the best overall results for cross-resolution face recognition with relatively high-quality probe images, while the degradation process on its own is the more suitable option for low-quality probe images. Our results show that the combination of standard computer vision approaches such as degradation, super-resolution, feature fusion, and score fusion can be used to substantially improve performance on the task of low resolution face recognition using off-the-shelf face recognition models without re-training on the target domain.},

keywords = {CNN, face recognition, low quality, super-resolution},

pubstate = {published},

tppubtype = {article}

}

Close

Vitek, Matej; Štruc, Vitomir; Peer, Peter

GazeNet: A lightweight multitask sclera feature extractor Journal Article

In: Alexandria Engineering Journal, vol. 112, pp. 661-671, 2025.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, lightweight models, sclera

Dragar, Luka; Rot, Peter; Peer, Peter; Štruc, Vitomir; Batagelj, Borut

W-TDL: Window-Based Temporal Deepfake Localization Proceedings Article

In: Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing (MRAC ’24), Proceedings of the 32nd ACM International Conference on Multimedia (MM’24), ACM, 2024.

Abstract | Links | BibTeX | Tags: CNN, deepfake DAD, deepfakes, deeplearning, detection, localization

Boutros, Fadi; Štruc, Vitomir; Damer, Naser

AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition Proceedings Article

In: Proceedings of the European Conference on Computer Vision (ECCV 2024), pp. 1-20, 2024.

Abstract | Links | BibTeX | Tags: adaptive distillation, biometrics, CNN, deep learning, face, face recognition, knowledge distillation

Ocvirk, Krištof; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut

Primerjava metod za zaznavanje napadov ponovnega zajema Proceedings Article

In: Proceedings of ERK, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: attacks, biometrics, CNN, deep learning, identity cards, pad

Manojlovska, Anastasija; Štruc, Vitomir; Grm, Klemen

Interpretacija mehanizmov obraznih biometričnih modelov s kontrastnim multimodalnim učenjem Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: CNN, deep learning, face recognition, xai

Brodarič, Marko; Peer, Peter; Struc, Vitomir

Towards Improving Backbones for Deepfake Detection Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, 2024.

BibTeX | Tags: CNN, deep learning, deepfake detection, deepfakes, media forensics, transformer

Sikošek, Lovro; Brodarič, Marko; Peer, Peter; Struc, Vitomir; Batagelj, Borut

Detection of Presentation Attacks with 3D Masks Using Deep Learning Proceedings Article

In: Proceedings of ERK 2024, pp. 1-4, Portorož, Slovenia, 2024.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, face PAD, face recognition, pad

Plesh, Richard; Križaj, Janez; Bahmani, Keivan; Banavar, Mahesh; Struc, Vitomir; Schuckers, Stephanie

Discovering Interpretable Feature Directions in the Embedding Space of Face Recognition Models Proceedings Article

In: International Joint Conference on Biometrics (IJCB 2024), pp. 1-10, 2024.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, face recognition, feature space understanding, xai

@inproceedings{Krizaj,

title = {Discovering Interpretable Feature Directions in the Embedding Space of Face Recognition Models},

author = {Richard Plesh and Janez Križaj and Keivan Bahmani and Mahesh Banavar and Vitomir Struc and Stephanie Schuckers},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/08/107.pdf

https://lmi.fe.uni-lj.si/wp-content/uploads/2024/08/107-supp.pdf},

year  = {2024},

date = {2024-09-15},

booktitle = {International Joint Conference on Biometrics (IJCB 2024)},

pages = {1-10},

abstract = {Modern face recognition (FR) models, particularly their convolutional neural network based implementations, often raise concerns regarding privacy and ethics due to their “black-box” nature. To enhance the explainability of FR models and the interpretability of their embedding space, we introduce in this paper three novel techniques for discovering semantically meaningful feature directions (or axes). The first technique uses a dedicated facial-region blending procedure together with principal component analysis to discover embedding space direction that correspond to spatially isolated semantic face areas, providing a new perspective on facial feature interpretation. The other two proposed techniques exploit attribute labels to discern feature directions that correspond to intra-identity variations, such as pose, illumination angle, and expression, but do so either through a cluster analysis or a dedicated regression procedure. To validate the capabilities of the developed techniques, we utilize a powerful template decoder that inverts the image embedding back into the pixel space. Using the decoder, we visualize linear movements along the discovered directions, enabling a clearer understanding of the internal representations within face recognition models. The source code will be made publicly available.},

keywords = {biometrics, CNN, deep learning, face recognition, feature space understanding, xai},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Tomašević, Darian; Boutros, Fadi; Damer, Naser; Peer, Peter; Štruc, Vitomir

Generating bimodal privacy-preserving data for face recognition Journal Article

In: Engineering Applications of Artificial Intelligence, vol. 133, iss. E, pp. 1-25, 2024.

Abstract | Links | BibTeX | Tags: CNN, face, face generation, face images, face recognition, generative AI, StyleGAN2, synthetic data

@article{Darian2024,

title = {Generating bimodal privacy-preserving data for face recognition},

author = {Darian Tomašević and Fadi Boutros and Naser Damer and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/05/PapersDarian.pdf},

doi = {https://doi.org/10.1016/j.engappai.2024.108495},

year  = {2024},

date = {2024-05-01},

journal = {Engineering Applications of Artificial Intelligence},

volume = {133},

issue = {E},

pages = {1-25},

abstract = {The performance of state-of-the-art face recognition systems depends crucially on the availability of large-scale training datasets. However, increasing privacy concerns nowadays accompany the collection and distribution of biometric data, which has already resulted in the retraction of valuable face recognition datasets. The use of synthetic data represents a potential solution, however, the generation of privacy-preserving facial images useful for training recognition models is still an open problem. Generative methods also remain bound to the visible spectrum, despite the benefits that multispectral data can provide. To address these issues, we present a novel identity-conditioned generative framework capable of producing large-scale recognition datasets of visible and near-infrared privacy-preserving face images. The framework relies on a novel identity-conditioned dual-branch style-based generative adversarial network to enable the synthesis of aligned high-quality samples of identities determined by features of a pretrained recognition model. In addition, the framework incorporates a novel filter to prevent samples of privacy-breaching identities from reaching the generated datasets and improve both identity separability and intra-identity diversity. Extensive experiments on six publicly available datasets reveal that our framework achieves competitive synthesis capabilities while preserving the privacy of real-world subjects. The synthesized datasets also facilitate training more powerful recognition models than datasets generated by competing methods or even small-scale real-world datasets. Employing both visible and near-infrared data for training also results in higher recognition accuracy on real-world visible spectrum benchmarks. Therefore, training with multispectral data could potentially improve existing recognition systems that utilize only the visible spectrum, without the need for additional sensors.},

keywords = {CNN, face, face generation, face images, face recognition, generative AI, StyleGAN2, synthetic data},

pubstate = {published},

tppubtype = {article}

}

Close

Tomašević, Darian; Peer, Peter; Štruc, Vitomir

BiFaceGAN: Bimodal Face Image Synthesis Book Section

In: Bourlai, T. (Ed.): Face Recognition Across the Imaging Spectrum, pp. 273–311, Springer, Singapore, 2024, ISBN: 978-981-97-2058-3.

Abstract | Links | BibTeX | Tags: CNN, deep learning, face synthesis, generative AI, stlyegan

@incollection{Darian2024Book,

title = {BiFaceGAN: Bimodal Face Image Synthesis},

author = {Darian Tomašević and Peter Peer and Vitomir Štruc},

editor = {T. Bourlai},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/11/BiFaceGAN.pdf},

doi = {https://doi.org/10.1007/978-981-97-2059-0_11},

isbn = {978-981-97-2058-3},

year  = {2024},

date = {2024-05-01},

urldate = {2024-05-01},

booktitle = {Face Recognition Across the Imaging Spectrum},

pages = {273–311},

publisher = {Springer, Singapore},

abstract = {Modern face recognition and segmentation systems, such as all deep learning approaches, rely on large-scale annotated datasets to achieve competitive performance. However, gathering biometric data often raises privacy concerns and presents a labor-intensive and time-consuming task. Researchers are currently also exploring the use of multispectral data to improve existing solutions, limited to the visible spectrum. Unfortunately, the collection of suitable data is even more difficult, especially if aligned images are required. To address the outlined issues, we present a novel synthesis framework, named BiFaceGAN, capable of producing privacy-preserving large-scale synthetic datasets of photorealistic face images, in the visible and the near-infrared spectrum, along with corresponding ground-truth pixel-level annotations. The proposed framework leverages an innovative Dual-Branch Style-based generative adversarial network (DB-StyleGAN2) to generate per-pixel-aligned bimodal images, followed by an ArcFace Privacy Filter (APF) that ensures the removal of privacy-breaching images. Furthermore, we also implement a Semantic Mask Generator (SMG) that produces reference ground-truth segmentation masks of the synthetic data, based on the latent representations inside the synthesis model and only a handful of manually labeled examples. We evaluate the quality of generated images and annotations through a series of experiments and analyze the benefits of generating bimodal data with a single network. We also show that privacy-preserving data filtering does not notably degrade the image quality of produced datasets. Finally, we demonstrate that the generated data can be employed to train highly successful deep segmentation models, which can generalize well to other real-world datasets.},

keywords = {CNN, deep learning, face synthesis, generative AI, stlyegan},

pubstate = {published},

tppubtype = {incollection}

}

Close

Babnik, Žiga; Boutros, Fadi; Damer, Naser; Peer, Peter; Štruc, Vitomir

AI-KD: Towards Alignment Invariant Face Image Quality Assessment Using Knowledge Distillation Proceedings Article

In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), pp. 1-6, 2024.

Abstract | Links | BibTeX | Tags: ai, CNN, deep learning, face, face image quality assessment, face image quality estimation, face images, face recognition, face verification

Babnik, Žiga; Peer, Peter; Štruc, Vitomir

eDifFIQA: Towards Efficient Face Image Quality Assessment based on Denoising Diffusion Probabilistic Models Journal Article

In: IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM), pp. 1-16, 2024, ISSN: 2637-6407.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, DifFIQA, difussion, face, face image quality assesment, face recognition, FIQA

@article{BabnikTBIOM2024,

title = {eDifFIQA: Towards Efficient Face Image Quality Assessment based on Denoising Diffusion Probabilistic Models},

author = {Žiga Babnik and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/03/TBIOM___DifFIQAv2.pdf

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10468647&tag=1},

doi = {10.1109/TBIOM.2024.3376236},

issn = {2637-6407},

year  = {2024},

date = {2024-03-07},

urldate = {2024-03-07},

journal = {IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM)},

pages = {1-16},

abstract = {State-of-the-art Face Recognition (FR) models perform well in constrained scenarios, but frequently fail in difficult real-world scenarios, when no quality guarantees can be made for face samples. For this reason, Face Image Quality Assessment (FIQA) techniques are often used by FR systems, to provide quality estimates of captured face samples. The quality estimate provided by FIQA techniques can be used by the FR system to reject samples of low-quality, in turn improving the performance of the system and reducing the number of critical false-match errors. However, despite steady improvements, ensuring a good trade-off between the performance and computational complexity of FIQA methods across diverse face samples remains challenging. In this paper, we present DifFIQA, a powerful unsupervised approach for quality assessment based on the popular denoising diffusion probabilistic models (DDPMs) and the extended (eDifFIQA) approach. The main idea of the base DifFIQA approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because of the iterative nature of DDPMs the base DifFIQA approach is extremely computationally expensive. Using eDifFIQA we are able to improve on both the performance and computational complexity of the base DifFIQA approach, by employing label optimized knowledge distillation. In this process, quality information inferred by DifFIQA is distilled into a quality-regression model. During the distillation process, we use an additional source of quality information hidden in the relative position of the embedding to further improve the predictive capabilities of the underlying regression model. By choosing different feature extraction backbone models as the basis for the quality-regression eDifFIQA model, we are able to control the trade-off between the predictive capabilities and computational complexity of the final model. We evaluate three eDifFIQA variants of varying sizes in comprehensive experiments on 7 diverse datasets containing static-images and a separate video-based dataset, with 4 target CNN-based FR models and 2 target Transformer-based FR models and against 10 state-of-the-art FIQA techniques, as well as against the initial DifFIQA baseline and a simple regression-based predictor DifFIQA(R), distilled from DifFIQA without any additional optimization. The results show that the proposed label optimized knowledge distillation improves on the performance and computationally complexity of the base DifFIQA approach, and is able to achieve state-of-the-art performance in several distinct experimental scenarios. Furthermore, we also show that the distilled model can be used directly for face recognition and leads to highly competitive results.},

keywords = {biometrics, CNN, deep learning, DifFIQA, difussion, face, face image quality assesment, face recognition, FIQA},

pubstate = {published},

tppubtype = {article}

}

Close

State-of-the-art Face Recognition (FR) models perform well in constrained scenarios, but frequently fail in difficult real-world scenarios, when no quality guarantees can be made for face samples. For this reason, Face Image Quality Assessment (FIQA) techniques are often used by FR systems, to provide quality estimates of captured face samples. The quality estimate provided by FIQA techniques can be used by the FR system to reject samples of low-quality, in turn improving the performance of the system and reducing the number of critical false-match errors. However, despite steady improvements, ensuring a good trade-off between the performance and computational complexity of FIQA methods across diverse face samples remains challenging. In this paper, we present DifFIQA, a powerful unsupervised approach for quality assessment based on the popular denoising diffusion probabilistic models (DDPMs) and the extended (eDifFIQA) approach. The main idea of the base DifFIQA approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because of the iterative nature of DDPMs the base DifFIQA approach is extremely computationally expensive. Using eDifFIQA we are able to improve on both the performance and computational complexity of the base DifFIQA approach, by employing label optimized knowledge distillation. In this process, quality information inferred by DifFIQA is distilled into a quality-regression model. During the distillation process, we use an additional source of quality information hidden in the relative position of the embedding to further improve the predictive capabilities of the underlying regression model. By choosing different feature extraction backbone models as the basis for the quality-regression eDifFIQA model, we are able to control the trade-off between the predictive capabilities and computational complexity of the final model. We evaluate three eDifFIQA variants of varying sizes in comprehensive experiments on 7 diverse datasets containing static-images and a separate video-based dataset, with 4 target CNN-based FR models and 2 target Transformer-based FR models and against 10 state-of-the-art FIQA techniques, as well as against the initial DifFIQA baseline and a simple regression-based predictor DifFIQA(R), distilled from DifFIQA without any additional optimization. The results show that the proposed label optimized knowledge distillation improves on the performance and computationally complexity of the base DifFIQA approach, and is able to achieve state-of-the-art performance in several distinct experimental scenarios. Furthermore, we also show that the distilled model can be used directly for face recognition and leads to highly competitive results.

Close

Ivanovska, Marija; Štruc, Vitomir

Y-GAN: Learning Dual Data Representations for Anomaly Detection in Images Journal Article

In: Expert Systems with Applications (ESWA), vol. 248, no. 123410, pp. 1-7, 2024.

Abstract | Links | BibTeX | Tags: anomaly detection, CNN, deep learning, one-class learning, y-gan

@article{ESWA2024,

title = {Y-GAN: Learning Dual Data Representations for Anomaly Detection in Images},

author = {Marija Ivanovska and Vitomir Štruc},

url = {https://www.sciencedirect.com/science/article/pii/S0957417424002756

https://lmi.fe.uni-lj.si/wp-content/uploads/2024/02/YGAN_Marija.pdf},

doi = {https://doi.org/10.1016/j.eswa.2024.123410},

year  = {2024},

date = {2024-03-01},

urldate = {2024-03-01},

journal = {Expert Systems with Applications (ESWA)},

volume = {248},

number = {123410},

pages = {1-7},

abstract = {We propose a novel reconstruction-based model for anomaly detection in image data, called 'Y-GAN'. The model consists of a Y-shaped auto-encoder and represents images in two separate latent spaces. The first captures meaningful image semantics, which are key for representing (normal) training data, whereas the second encodes low-level residual image characteristics. To ensure the dual representations encode mutually exclusive information, a disentanglement procedure is designed around a latent (proxy) classifier. Additionally, a novel representation-consistency mechanism is proposed to prevent information leakage between the latent spaces. The model is trained in a one-class learning setting using only normal training data. Due to the separation of semantically-relevant and residual information, Y-GAN is able to derive informative data representations that allow for efficacious anomaly detection across a diverse set of anomaly detection tasks. The model is evaluated in comprehensive experiments with several recent anomaly detection models using four popular image datasets, i.e., MNIST, FMNIST, CIFAR10, and PlantVillage. Experimental results show that Y-GAN outperforms all tested models by a considerable margin and yields state-of-the-art results. The source code for the model is made publicly available at https://github.com/MIvanovska/Y-GAN. },

keywords = {anomaly detection, CNN, deep learning, one-class learning, y-gan},

pubstate = {published},

tppubtype = {article}

}

Close

Križaj, Janez; Plesh, Richard O.; Banavar, Mahesh; Schuckers, Stephanie; Štruc, Vitomir

Deep Face Decoder: Towards understanding the embedding space of convolutional networks through visual reconstruction of deep face templates Journal Article

In: Engineering Applications of Artificial Intelligence, vol. 132, iss. 107941, pp. 1-20, 2024.

Abstract | Links | BibTeX | Tags: CNN, embedding space, face, face images, face recognition, face synthesis, template reconstruction, xai

@article{KrizajEAAI2024,

title = {Deep Face Decoder: Towards understanding the embedding space of convolutional networks through visual reconstruction of deep face templates},

author = {Janez Križaj and Richard O. Plesh and Mahesh Banavar and Stephanie Schuckers and Vitomir Štruc},

url = {https://www.sciencedirect.com/science/article/abs/pii/S095219762400099X

https://lmi.fe.uni-lj.si/wp-content/uploads/2025/02/Deep_Face_Decoder__Elsevier_template_.pdf},

doi = {https://doi.org/10.1016/j.engappai.2024.107941},

year  = {2024},

date = {2024-01-30},

urldate = {2024-01-30},

journal = {Engineering Applications of Artificial Intelligence},

volume = {132},

issue = {107941},

pages = {1-20},

abstract = {Advances in deep learning and convolutional neural networks (ConvNets) have driven remarkable face recognition (FR) progress recently. However, the black-box nature of modern ConvNet-based face recognition models makes it challenging to interpret their decision-making process, to understand the reasoning behind specific success and failure cases, or to predict their responses to unseen data characteristics. It is, therefore, critical to design mechanisms that explain the inner workings of contemporary FR models and offer insight into their behavior. To address this challenge, we present in this paper a novel textit{template-inversion approach} capable of reconstructing high-fidelity face images from the embeddings (templates, feature-space representations) produced by modern FR techniques. Our approach is based on a novel Deep Face Decoder (DFD) trained in a regression setting to visualize the information encoded in the embedding space with the goal of fostering explainability. We utilize the developed DFD model in comprehensive experiments on multiple unconstrained face datasets, namely Visual Geometry Group Face dataset 2 (VGGFace2), Labeled Faces in the Wild (LFW), and Celebrity Faces Attributes Dataset High Quality (CelebA-HQ). Our analysis focuses on the embedding spaces of two distinct face recognition models with backbones based on the Visual Geometry Group 16-layer model (VGG-16) and the 50-layer Residual Network (ResNet-50). The results reveal how information is encoded in the two considered models and how perturbations in image appearance due to rotations, translations, scaling, occlusion, or adversarial attacks, are propagated into the embedding space. Our study offers researchers a deeper comprehension of the underlying mechanisms of ConvNet-based FR models, ultimately promoting advancements in model design and explainability. },

keywords = {CNN, embedding space, face, face images, face recognition, face synthesis, template reconstruction, xai},

pubstate = {published},

tppubtype = {article}

}

Close

Pernuš, Martin; Štruc, Vitomir; Dobrišek, Simon

MaskFaceGAN: High Resolution Face Editing With Masked GAN Latent Code Optimization Journal Article

In: IEEE Transactions on Image Processing, 2023, ISSN: 1941-0042.

Abstract | Links | BibTeX | Tags: CNN, computer vision, deep learning, face editing, face image processing, GAN, GAN inversion, generative models, StyleGAN

@article{MaskFaceGAN,

title = {MaskFaceGAN: High Resolution Face Editing With Masked GAN Latent Code Optimization},

author = {Martin Pernuš and Vitomir Štruc and Simon Dobrišek},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10299582

https://lmi.fe.uni-lj.si/wp-content/uploads/2023/02/MaskFaceGAN_compressed.pdf

https://arxiv.org/pdf/2103.11135.pdf},

doi = {10.1109/TIP.2023.3326675},

issn = {1941-0042},

year  = {2023},

date = {2023-10-27},

urldate = {2023-01-02},

journal = {IEEE Transactions on Image Processing},

abstract = {Face editing represents a popular research topic within the computer vision and image processing communities. While significant progress has been made recently in this area, existing solutions: ( i ) are still largely focused on low-resolution images, ( ii ) often generate editing results with visual artefacts, or ( iii ) lack fine-grained control over the editing procedure and alter multiple (entangled) attributes simultaneously, when trying to generate the desired facial semantics. In this paper, we aim to address these issues through a novel editing approach, called MaskFaceGAN that focuses on local attribute editing. The proposed approach is based on an optimization procedure that directly optimizes the latent code of a pre-trained (state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with respect to several constraints that ensure: ( i ) preservation of relevant image content, ( ii ) generation of the targeted facial attributes, and ( iii ) spatially–selective treatment of local image regions. The constraints are enforced with the help of an (differentiable) attribute classifier and face parser that provide the necessary reference information for the optimization procedure. MaskFaceGAN is evaluated in extensive experiments on the FRGC, SiblingsDB-HQf, and XM2VTS datasets and in comparison with several state-of-the-art techniques from the literature. Our experimental results show that the proposed approach is able to edit face images with respect to several local facial attributes with unprecedented image quality and at high-resolutions (1024×1024), while exhibiting considerably less problems with attribute entanglement than competing solutions. The source code is publicly available from: https://github.com/MartinPernus/MaskFaceGAN.},

keywords = {CNN, computer vision, deep learning, face editing, face image processing, GAN, GAN inversion, generative models, StyleGAN},

pubstate = {published},

tppubtype = {article}

}

Close

Larue, Nicolas; Vu, Ngoc-Son; Štruc, Vitomir; Peer, Peter; Christophides, Vassilis

SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes Proceedings Article

In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 21011 - 21021, IEEE 2023.

Abstract | Links | BibTeX | Tags: CNN, deepfake detection, deepfakes, face, media forensics, one-class learning, representation learning

@inproceedings{NicolasCCV,

title = {SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes},

author = {Nicolas Larue and Ngoc-Son Vu and Vitomir Štruc and Peter Peer and Vassilis Christophides},

url = {https://openaccess.thecvf.com/content/ICCV2023/papers/Larue_SeeABLE_Soft_Discrepancies_and_Bounded_Contrastive_Learning_for_Exposing_Deepfakes_ICCV_2023_paper.pdf

https://lmi.fe.uni-lj.si/wp-content/uploads/2024/01/SeeABLE_compressed.pdf

https://lmi.fe.uni-lj.si/wp-content/uploads/2024/01/SeeABLE_supplementary_compressed.pdf},

year  = {2023},

date = {2023-10-01},

urldate = {2023-10-01},

booktitle = {Proceedings of the International Conference on Computer Vision (ICCV)},

pages = {21011 - 21021},

organization = {IEEE},

abstract = {Modern deepfake detectors have achieved encouraging results, when training and test images are drawn from the same data collection. However, when these detectors are applied to images produced with unknown deepfake-generation techniques, considerable performance degradations are commonly observed. In this paper, we propose a novel deepfake detector, called SeeABLE, that formalizes the detection problem as a (one-class) out-of-distribution detection task and generalizes better to unseen deepfakes. Specifically, SeeABLE first generates local image perturbations (referred to as soft-discrepancies) and then pushes the perturbed faces towards predefined prototypes using a novel regression-based bounded contrastive loss. To strengthen the generalization performance of SeeABLE to unknown deepfake types, we generate a rich set of soft discrepancies and train the detector: (i) to localize, which part of the face was modified, and (ii) to identify the alteration type. To demonstrate the capabilities of SeeABLE, we perform rigorous experiments on several widely-used deepfake datasets and show that our model convincingly outperforms competing state-of-the-art detectors, while exhibiting highly encouraging generalization capabilities. The source code for SeeABLE is available from: https://github.com/anonymous-author-sub/seeable.

},

keywords = {CNN, deepfake detection, deepfakes, face, media forensics, one-class learning, representation learning},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Vitek, Matej; Bizjak, Matic; Peer, Peter; Štruc, Vitomir

IPAD: Iterative Pruning with Activation Deviation for Sclera Biometrics Journal Article

In: Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 8, pp. 1-21, 2023.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, model compression, pruning, sclera, sclera segmentation

@article{VitekSaud2023,

title = {IPAD: Iterative Pruning with Activation Deviation for Sclera Biometrics},

author = {Matej Vitek and Matic Bizjak and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/07/PublishedVersion.pdf},

doi = {https://doi.org/10.1016/j.jksuci.2023.101630},

year  = {2023},

date = {2023-07-10},

journal = {Journal of King Saud University - Computer and Information Sciences},

volume = {35},

number = {8},

pages = {1-21},

abstract = {The sclera has recently been gaining attention as a biometric modality due to its various desirable characteristics. A key step in any type of ocular biometric recognition, including sclera recognition, is the segmentation of the relevant part(s) of the eye. However, the high computational complexity of the (deep) segmentation models used in this task can limit their applicability on resource-constrained devices such as smartphones or head-mounted displays. As these devices are a common desired target for such biometric systems, lightweight solutions for ocular segmentation are critically needed. To address this issue, this paper introduces IPAD (Iterative Pruning with Activation Deviation), a novel method for developing lightweight convolutional networks, that is based on model pruning. IPAD uses a novel filter-activation-based criterion (ADC) to determine low-importance filters and employs an iterative model pruning procedure to derive the final lightweight model. To evaluate the proposed pruning procedure, we conduct extensive experiments with two diverse segmentation models, over four publicly available datasets (SBVPI, SLD, SMD and MOBIUS), in four distinct problem configurations and in comparison to state-of-the-art methods from the literature. The results of the experiments show that the proposed filter-importance criterion outperforms the standard L1 and L2 approaches from the literature. Furthermore, the results also suggest that: 1) the pruned models are able to retain (or even improve on) the performance of the unpruned originals, as long as they are not over-pruned, with RITnet and U-Net at 50% of their original FLOPs reaching up to 4% and 7% higher IoU values than their unpruned versions, respectively, 2) smaller models require more careful pruning, as the pruning process can hurt the model’s generalization capabilities, and 3) the novel criterion most convincingly outperforms the classic approaches when sufficient training data is available, implying that the abundance of data leads to more robust activation-based importance computation.},

keywords = {biometrics, CNN, deep learning, model compression, pruning, sclera, sclera segmentation},

pubstate = {published},

tppubtype = {article}

}

Close

The sclera has recently been gaining attention as a biometric modality due to its various desirable characteristics. A key step in any type of ocular biometric recognition, including sclera recognition, is the segmentation of the relevant part(s) of the eye. However, the high computational complexity of the (deep) segmentation models used in this task can limit their applicability on resource-constrained devices such as smartphones or head-mounted displays. As these devices are a common desired target for such biometric systems, lightweight solutions for ocular segmentation are critically needed. To address this issue, this paper introduces IPAD (Iterative Pruning with Activation Deviation), a novel method for developing lightweight convolutional networks, that is based on model pruning. IPAD uses a novel filter-activation-based criterion (ADC) to determine low-importance filters and employs an iterative model pruning procedure to derive the final lightweight model. To evaluate the proposed pruning procedure, we conduct extensive experiments with two diverse segmentation models, over four publicly available datasets (SBVPI, SLD, SMD and MOBIUS), in four distinct problem configurations and in comparison to state-of-the-art methods from the literature. The results of the experiments show that the proposed filter-importance criterion outperforms the standard L1 and L2 approaches from the literature. Furthermore, the results also suggest that: 1) the pruned models are able to retain (or even improve on) the performance of the unpruned originals, as long as they are not over-pruned, with RITnet and U-Net at 50% of their original FLOPs reaching up to 4% and 7% higher IoU values than their unpruned versions, respectively, 2) smaller models require more careful pruning, as the pruning process can hurt the model’s generalization capabilities, and 3) the novel criterion most convincingly outperforms the classic approaches when sufficient training data is available, implying that the abundance of data leads to more robust activation-based importance computation.

Close

Pernuš, Martin; Bhatnagar, Mansi; Samad, Badr; Singh, Divyanshu; Peer, Peter; Štruc, Vitomir; Dobrišek, Simon

ChildNet: Structural Kinship Face Synthesis Model With Appearance Control Mechanisms Journal Article

In: IEEE Access, pp. 1-22, 2023, ISSN: 2169-3536.

Abstract | Links | BibTeX | Tags: artificial intelligence, CNN, deep learning, face generation, face synthesis, GAN, GAN inversion, kinship, kinship synthesis, StyleGAN2

@article{AccessMartin2023,

title = {ChildNet: Structural Kinship Face Synthesis Model With Appearance Control Mechanisms},

author = {Martin Pernuš and Mansi Bhatnagar and Badr Samad and Divyanshu Singh and Peter Peer and Vitomir Štruc and Simon Dobrišek},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10126110},

doi = {10.1109/ACCESS.2023.3276877},

issn = {2169-3536},

year  = {2023},

date = {2023-05-17},

journal = {IEEE Access},

pages = {1-22},

abstract = {Kinship face synthesis is an increasingly popular topic within the computer vision community, particularly the task of predicting the child appearance using parental images. Previous work has been limited in terms of model capacity and inadequate training data, which is comprised of low-resolution and tightly cropped images, leading to lower synthesis quality. In this paper, we propose ChildNet,  a method for kinship face synthesis that leverages the facial image generation capabilities of a state-of-the-art Generative Adversarial Network (GAN), and resolves the aforementioned problems. ChildNet is designed within the GAN latent space and is able to predict a child appearance that bears high resemblance to real parents’ children. To ensure fine-grained control, we propose an age and gender manipulation module that allows precise manipulation of the child synthesis result. ChildNet is capable of generating multiple child images per parent pair input, while providing a way to control the image generation variability. Additionally, we introduce a mechanism to control the dominant parent image. Finally, to facilitate the task of kinship face synthesis, we introduce a new kinship dataset, called Next of Kin. This dataset contains 3690 high-resolution face images with a diverse range of ethnicities and ages. We evaluate ChildNet in comprehensive experiments against three competing kinship face synthesis models, using two kinship datasets. The experiments demonstrate the superior performance of ChildNet in terms of identity similarity, while exhibiting high perceptual image quality. The source code for the model is publicly available at: https://github.com/MartinPernus/ChildNet.},

keywords = {artificial intelligence, CNN, deep learning, face generation, face synthesis, GAN, GAN inversion, kinship, kinship synthesis, StyleGAN2},

pubstate = {published},

tppubtype = {article}

}

Close

Boutros, Fadi; Štruc, Vitomir; Fierrez, Julian; Damer, Naser

Synthetic data for face recognition: Current state and future prospects Journal Article

In: Image and Vision Computing, no. 104688, 2023.

Abstract | Links | BibTeX | Tags: biometrics, CNN, diffusion, face recognition, generative models, survey, synthetic data

Meden, Blaž; Gonzalez-Hernandez, Manfred; Peer, Peter; Štruc, Vitomir

Face deidentification with controllable privacy protection Journal Article

In: Image and Vision Computing, vol. 134, no. 104678, pp. 1-19, 2023.

Abstract | Links | BibTeX | Tags: CNN, deep learning, deidentification, face recognition, GAN, GAN inversion, privacy, privacy protection, StyleGAN2

@article{MedenDeID2023,

title = {Face deidentification with controllable privacy protection},

author = {Blaž Meden and Manfred Gonzalez-Hernandez and Peter Peer and Vitomir Štruc},

url = {https://reader.elsevier.com/reader/sd/pii/S0262885623000525?token=BC1E21411C50118E666720B002A89C9EB3DB4CFEEB5EB18D7BD7B0613085030A96621C8364583BFE7BAE025BE3646096&originRegion=eu-west-1&originCreation=20230516115322},

doi = {https://doi.org/10.1016/j.imavis.2023.104678},

year  = {2023},

date = {2023-04-01},

journal = {Image and Vision Computing},

volume = {134},

number = {104678},

pages = {1-19},

abstract = {Privacy protection has become a crucial concern in today’s digital age. Particularly sensitive here are facial images, which typically not only reveal a person’s identity, but also other sensitive personal information. To address this problem, various face deidentification techniques have been presented in the literature. These techniques try to remove or obscure personal information from facial images while still preserving their usefulness for further analysis. While a considerable amount of work has been proposed on face deidentification, most state-of-theart solutions still suffer from various drawbacks, and (a) deidentify only a narrow facial area, leaving potentially important contextual information unprotected, (b) modify facial images to such degrees, that image naturalness and facial diversity is suffering in the deidentify images, (c) offer no flexibility in the level of privacy protection ensured, leading to suboptimal deployment in various applications, and (d) often offer an unsatisfactory tradeoff between the ability to obscure identity information, quality and naturalness of the deidentified images, and sufficient utility preservation. In this paper, we address these shortcomings with a novel controllable face deidentification technique that balances image quality, identity protection, and data utility for further analysis. The proposed approach utilizes a powerful generative model (StyleGAN2), multiple auxiliary classification models, and carefully designed constraints to guide the deidentification process. The approach is validated across four diverse datasets (CelebA-HQ, RaFD, XM2VTS, AffectNet) and in comparison to 7 state-of-the-art competitors. The results of the experiments demonstrate that the proposed solution leads to: (a) a considerable level of identity protection, (b) valuable preservation of data utility, (c) sufficient diversity among the deidentified faces, and (d) encouraging overall performance.},

keywords = {CNN, deep learning, deidentification, face recognition, GAN, GAN inversion, privacy, privacy protection, StyleGAN2},

pubstate = {published},

tppubtype = {article}

}

Close

Hrovatič, Anja; Peer, Peter; Štruc, Vitomir; Emeršič, Žiga

Efficient ear alignment using a two-stack hourglass network Journal Article

In: IET Biometrics , pp. 1-14, 2023, ISSN: 2047-4938.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, ear, ear alignment, ear recognition

@article{UhljiIETZiga,

title = {Efficient ear alignment using a two-stack hourglass network},

author = {Anja Hrovatič and Peter Peer and Vitomir Štruc and Žiga Emeršič},

url = {https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2.12109},

doi = {10.1049/bme2.12109},

issn = {2047-4938},

year  = {2023},

date = {2023-01-01},

journal = {IET Biometrics },

pages = {1-14},

abstract = {Ear images have been shown to be a reliable modality for biometric recognition with desirable characteristics, such as high universality, distinctiveness, measurability and permanence. While a considerable amount of research has been directed towards ear recognition techniques, the problem of ear alignment is still under-explored in the open literature. Nonetheless, accurate alignment of ear images, especially in unconstrained acquisition scenarios, where the ear appearance is expected to vary widely due to pose and view point variations, is critical for the performance of all downstream tasks, including ear recognition. Here, the authors address this problem and present a framework for ear alignment that relies on a two-step procedure: (i) automatic landmark detection and (ii) fiducial point alignment. For the first (landmark detection) step, the authors implement and train a Two-Stack Hourglass model (2-SHGNet) capable of accurately predicting 55 landmarks on diverse ear images captured in uncontrolled conditions. For the second (alignment) step, the authors use the Random Sample Consensus (RANSAC) algorithm to align the estimated landmark/fiducial points with a pre-defined ear shape (i.e. a collection of average ear landmark positions). The authors evaluate the proposed framework in comprehensive experiments on the AWEx and ITWE datasets and show that the 2-SHGNet model leads to more accurate landmark predictions than competing state-of-the-art models from the literature. Furthermore, the authors also demonstrate that the alignment step significantly improves recognition accuracy with ear images from unconstrained environments compared to unaligned imagery.},

keywords = {biometrics, CNN, deep learning, ear, ear alignment, ear recognition},

pubstate = {published},

tppubtype = {article}

}

Close

Gan, Chenquan; Yang, Yucheng; Zhub, Qingyi; Jain, Deepak Kumar; Struc, Vitomir

DHF-Net: A hierarchical feature interactive fusion network for dialogue emotion recognition Journal Article

In: Expert Systems with Applications, vol. 210, 2022.

Abstract | Links | BibTeX | Tags: attention, CNN, deep learning, dialogue, emotion recognition, fusion, fusion network, nlp, semantics, text, text processing

Tomašević, Darian; Peer, Peter; Štruc, Vitomir

BiOcularGAN: Bimodal Synthesis and Annotation of Ocular Images Proceedings Article

In: IEEE/IAPR International Joint Conference on Biometrics (IJCB 2022) , pp. 1-10, 2022.

Abstract | Links | BibTeX | Tags: biometrics, CNN, data synthesis, deep learning, ocular, segmentation, StyleGAN, synthetic data

Šircelj, Jaka; Peer, Peter; Solina, Franc; Štruc, Vitomir

Hierarchical Superquadric Decomposition with Implicit Space Separation Proceedings Article

In: Proceedings of ERK 2022, pp. 1-4, 2022.

Abstract | Links | BibTeX | Tags: CNN, deep learning, depth estimation, iterative procedure, model fitting, recursive model, superquadric, superquadrics, volumetric primitive

Babnik, Žiga; Štruc, Vitomir

Iterativna optimizacija ocen kakovosti slikovnih podatkov v sistemih za razpoznavanje obrazov Proceedings Article

In: Proceedings of ERK 2022, pp. 1-4, 2022.

Abstract | Links | BibTeX | Tags: CNN, face image quality estimation, face quality, face recognition, optimization, supervised quality estimation

@inproceedings{BabnikErk2022,

title = {Iterativna optimizacija ocen kakovosti slikovnih podatkov v sistemih za razpoznavanje obrazov},

author = {Žiga Babnik and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/08/ERK_2022.pdf},

year  = {2022},

date = {2022-08-01},

booktitle = {Proceedings of ERK 2022},

pages = {1-4},

abstract = {While recent face recognition (FR) systems achieve excellent results in many deployment scenarios, their performance in challenging real-world settings is still under question. For this reason, face image quality assessment (FIQA) techniques aim to support FR systems, by providing them with sample quality information that can be used to reject poor quality data unsuitable for recognition purposes. Several groups of FIQA methods relying on different concepts have been proposed in the literature, all of which can be used for generating quality scores of facial images that can serve as pseudo ground-truth (quality) labels and be exploited for training (regression-based) quality estimation models. Several FIQA approaches show that a significant amount of sample-quality information can be extracted from mated similarity-score distributions generated with some face matcher. Based on this insight, we propose in this paper a quality label optimization approach, which incorporates sample-quality information from mated-pair similarities into quality predictions of existing off-the-shelf FIQA techniques. We evaluate the proposed approach using three state-of-the-art FIQA methods over three diverse datasets. The results of our experiments show that the proposed optimization procedure heavily depends on the number of executed optimization iterations. At ten iterations, the approach seems to perform the best, consistently outperforming the base quality scores of the three FIQA methods, chosen for the experiments.},

keywords = {CNN, face image quality estimation, face quality, face recognition, optimization, supervised quality estimation},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Tomašecić, Darian; Peer, Peter; Solina, Franc; Jaklič, Aleš; Štruc, Vitomir

Reconstructing Superquadrics from Intensity and Color Images Journal Article

In: Sensors, vol. 22, iss. 4, no. 5332, 2022.

Abstract | Links | BibTeX | Tags: arrs, CNN, depth data, depth estimation, depth sensing, intensity images, superquadric, superquadrics

@article{TomasevicSensors,

title = {Reconstructing Superquadrics from Intensity and Color Images},

author = {Darian Tomašecić and Peter Peer and Franc Solina and Aleš Jaklič and Vitomir Štruc},

url = {https://www.mdpi.com/1424-8220/22/14/5332/pdf?version=1658380987},

doi = {https://doi.org/10.3390/s22145332},

year  = {2022},

date = {2022-07-16},

journal = {Sensors},

volume = {22},

number = {5332},

issue = {4},

abstract = {The task of reconstructing 3D scenes based on visual data represents a longstanding problem in computer vision. Common reconstruction approaches rely on the use of multiple volumetric primitives to describe complex objects. Superquadrics (a class of volumetric primitives) have shown great promise due to their ability to describe various shapes with only a few parameters. Recent research has shown that deep learning methods can be used to accurately reconstruct random superquadrics from both 3D point cloud data and simple depth images. In this paper, we extended these reconstruction methods to intensity and color images. Specifically, we used a dedicated convolutional neural network (CNN) model to reconstruct a single superquadric from the given input image. We analyzed the results in a qualitative and quantitative manner, by visualizing reconstructed superquadrics as well as observing error and accuracy distributions of predictions. We showed that a CNN model designed around a simple ResNet backbone can be used to accurately reconstruct superquadrics from images containing one object, but only if one of the spatial parameters is fixed or if it can be determined from other image characteristics, e.g., shadows. Furthermore, we experimented with images of increasing complexity, for example, by adding textures, and observed that the results degraded only slightly. In addition, we show that our model outperforms the current state-of-the-art method on the studied task. Our final result is a highly accurate superquadric reconstruction model, which can also reconstruct superquadrics from real images of simple objects, without additional training.},

keywords = {arrs, CNN, depth data, depth estimation, depth sensing, intensity images, superquadric, superquadrics},

pubstate = {published},

tppubtype = {article}

}

Close

Dvoršak, Grega; Dwivedi, Ankita; Štruc, Vitomir; Peer, Peter; Emeršič, Žiga

Kinship Verification from Ear Images: An Explorative Study with Deep Learning Models Proceedings Article

In: International Workshop on Biometrics and Forensics (IWBF), pp. 1–6, 2022.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, ear, ear biometrics, kinear, kinship, kinship recognition, transformer

Jug, Julijan; Lampe, Ajda; Štruc, Vitomir; Peer, Peter

Body Segmentation Using Multi-task Learning Proceedings Article

In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, 2022, ISBN: 978-1-6654-5818-4.

Abstract | Links | BibTeX | Tags: body segmentation, cn, CNN, computer vision, deep beauty, deep learning, multi-task learning, segmentation, virtual try-on

@inproceedings{JulijanJugBody,

title = {Body Segmentation Using Multi-task Learning},

author = {Julijan Jug and Ajda Lampe and Vitomir Štruc and Peter Peer},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/03/ICAIIC_paper.pdf},

doi = {10.1109/ICAIIC54071.2022.9722662},

isbn = {978-1-6654-5818-4},

year  = {2022},

date = {2022-01-20},

urldate = {2022-01-20},

booktitle = {International Conference on Artificial Intelligence in Information and Communication (ICAIIC)},

publisher = {IEEE},

abstract = {Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks.  Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance.  Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a  better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the  model is analysed through rigorous experiments on the  LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance. },

keywords = {body segmentation, cn, CNN, computer vision, deep beauty, deep learning, multi-task learning, segmentation, virtual try-on},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Grm, Klemen; Vitomir, Štruc

Frequency Band Encoding for Face Super-Resolution Proceedings Article

In: Proceedings of ERK 2021, pp. 1-4, 2021.

Abstract | Links | BibTeX | Tags: CNN, deep learning, face, face hallucination, frequency encoding, super-resolution

Pevec, Klemen; Grm, Klemen; Štruc, Vitomir

Benchmarking Crowd-Counting Techniques across Image Characteristics Journal Article

In: Elektorethniski Vestnik, vol. 88, iss. 5, pp. 227-235, 2021.

Abstract | Links | BibTeX | Tags: CNN, crowd counting, drones, image characteristics, model comparison, neural networks

Bortolato, Blaž; Ivanovska, Marija; Rot, Peter; Križaj, Janez; Terhorst, Philipp; Damer, Naser; Peer, Peter; Štruc, Vitomir

Learning privacy-enhancing face representations through feature disentanglement Proceedings Article

In: Proceedings of FG 2020, IEEE, 2020.

Abstract | Links | BibTeX | Tags: autoencoder, biometrics, CNN, disentaglement, face recognition, PFRNet, privacy, representation learning

@inproceedings{BortolatoFG2020,

title = {Learning privacy-enhancing face representations through feature disentanglement},

author = {Blaž Bortolato and Marija Ivanovska and Peter Rot and Janez Križaj and Philipp Terhorst and Naser Damer and Peter Peer and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2020/07/FG2020___Learning_privacy_enhancing_face_representations_through_feature_disentanglement-1.pdf

},

year  = {2020},

date = {2020-11-04},

booktitle = {Proceedings of FG 2020},

publisher = {IEEE},

abstract = {Convolutional Neural Networks (CNNs) are today the de-facto standard for extracting compact and discriminative face representations (templates) from images in automatic face recognition systems. Due to the characteristics of CNN models, the generated representations typically encode a multitude of information ranging from identity to soft-biometric attributes, such as age, gender or ethnicity. However, since these representations were computed for the purpose of identity recognition only, the soft-biometric information contained in the templates represents a serious privacy risk. To mitigate this problem, we present in this paper a privacy-enhancing approach capable of suppressing potentially sensitive soft-biometric information in face representations without significantly compromising identity information. Specifically, we introduce a Privacy-Enhancing Face-Representation learning Network (PFRNet) that disentangles identity from attribute information in face representations and consequently allows to efficiently suppress soft-biometrics in face templates. We demonstrate the feasibility of PFRNet on the problem of gender suppression and show through rigorous experiments on the CelebA, Labeled Faces in the Wild (LFW) and Adience datasets that the proposed disentanglement-based approach is highly effective and improves significantly on the existing state-of-the-art.},

keywords = {autoencoder, biometrics, CNN, disentaglement, face recognition, PFRNet, privacy, representation learning},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Šircelj, Jaka; Oblak, Tim; Grm, Klemen; Petković, Uroš; Jaklič, Aleš; Peer, Peter; Štruc, Vitomir; Solina, Franc

Segmentation and Recovery of Superquadric Models using Convolutional Neural Networks Proceedings Article

In: 25th Computer Vision Winter Workshop (CVWW 2020), 2020.

Abstract | Links | BibTeX | Tags: CNN, convolutional neural networks, segmentation, superquadrics, volumetric data

Stepec, Dejan; Emersic, Ziga; Peer, Peter; Struc, Vitomir

Constellation-Based Deep Ear Recognition Book Section

In: Jiang, R.; Li, CT.; Crookes, D.; Meng, W.; Rosenberger, C. (Ed.): Deep Biometrics: Unsupervised and Semi-Supervised Learning, Springer, 2020, ISBN: 978-3-030-32582-4.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, ear recognition, neural networks

Grm, Klemen; Scheirer, Walter J.; Štruc, Vitomir

Face hallucination using cascaded super-resolution and identity priors Journal Article

In: IEEE Transactions on Image Processing, 2020.

Abstract | Links | BibTeX | Tags: biometrics, CNN, computer vision, deep learning, face, face hallucination, super-resolution

@article{TIPKlemen_2020,

title = {Face hallucination using cascaded super-resolution and identity priors},

author = {Klemen Grm and Walter J. Scheirer and Vitomir Štruc},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8866753

https://lmi.fe.uni-lj.si/wp-content/uploads/2023/02/IEEET_face_hallucination_compressed.pdf},

doi = {10.1109/TIP.2019.2945835},

year  = {2020},

date = {2020-01-01},

urldate = {2020-01-01},

journal = {IEEE Transactions on Image Processing},

abstract = {In this paper we address the problem of hallucinating high-resolution facial images from low-resolution inputs at high magnification factors. We approach this task with convolutional neural networks (CNNs) and propose a novel (deep) face hallucination model that incorporates identity priors into the learning procedure. The model consists of two main parts: i) a cascaded super-resolution network that upscales the lowresolution facial images, and ii) an ensemble of face recognition models that act as identity priors for the super-resolution network during training. Different from most competing super-resolution techniques that rely on a single model for upscaling (even with large magnification factors), our network uses a cascade of multiple SR models that progressively upscale the low-resolution images using steps of 2×. This characteristic allows us to apply supervision signals (target appearances) at different resolutions and incorporate identity constraints at multiple-scales. The proposed C-SRIP model (Cascaded Super Resolution with Identity Priors) is able to upscale (tiny) low-resolution images captured in unconstrained conditions and produce visually convincing results for diverse low-resolution inputs. We rigorously evaluate the proposed model on the Labeled Faces in the Wild (LFW), Helen and CelebA datasets and report superior performance compared to the existing state-of-the-art.

},

keywords = {biometrics, CNN, computer vision, deep learning, face, face hallucination, super-resolution},

pubstate = {published},

tppubtype = {article}

}

Close

Vitek, Matej; Rot, Peter; Struc, Vitomir; Peer, Peter

A comprehensive investigation into sclera biometrics: a novel dataset and performance study Journal Article

In: Neural Computing and Applications, pp. 1-15, 2020.

Abstract | Links | BibTeX | Tags: biometrics, CNN, dataset, multi-view, ocular, performance study, recognition, sclera, segmentation, visible light

@article{vitek2020comprehensive,

title = {A comprehensive investigation into sclera biometrics: a novel dataset and performance study},

author = {Matej Vitek and Peter Rot and Vitomir Struc and Peter Peer},

url = {https://link.springer.com/epdf/10.1007/s00521-020-04782-1},

doi = {https://doi.org/10.1007/s00521-020-04782-1},

year  = {2020},

date = {2020-01-01},

journal = {Neural Computing and Applications},

pages = {1-15},

abstract = {The area of ocular biometrics is among the most popular branches of biometric recognition technology. This area has long been dominated by iris recognition research, while other ocular modalities such as the periocular region or the vasculature of the sclera have received significantly less attention in the literature. Consequently, ocular modalities beyond the iris are not well studied and their characteristics are today still not as well understood. While recent needs for more secure authentication schemes have considerably increased the interest in competing ocular modalities, progress in these areas is still held back by the lack of publicly available datasets that would allow for more targeted research into specific ocular characteristics next to the iris. In this paper, we aim to bridge this gap for the case of sclera biometrics and introduce a novel dataset designed for research into ocular biometrics and most importantly for research into the vasculature of the sclera. Our dataset, called Sclera Blood Vessels, Periocular and Iris (SBVPI), is, to the best of our knowledge, the first publicly available dataset designed specifically with research in sclera biometrics in mind. The dataset contains high-quality RGB ocular images, captured in the visible spectrum, belonging to 55 subjects. Unlike competing datasets, it comes with manual markups of various eye regions, such as the iris, pupil, canthus or eyelashes and a detailed pixel-wise annotation of the complete sclera vasculature for a subset of the images. Additionally, the datasets ship with gender and age labels. The unique characteristics of the dataset allow us to study aspects of sclera biometrics technology that have not been studied before in the literature (e.g. vasculature segmentation techniques) as well as issues that are of key importance for practical recognition systems. Thus, next to the SBVPI dataset we also present in this paper a comprehensive investigation into sclera biometrics and the main covariates that affect the performance of sclera segmentation and recognition techniques, such as gender, age, gaze direction or image resolution. Our experiments not only demonstrate the usefulness of the newly introduced dataset, but also contribute to a better understanding of sclera biometrics in general.},

keywords = {biometrics, CNN, dataset, multi-view, ocular, performance study, recognition, sclera, segmentation, visible light},

pubstate = {published},

tppubtype = {article}

}

Close

The area of ocular biometrics is among the most popular branches of biometric recognition technology. This area has long been dominated by iris recognition research, while other ocular modalities such as the periocular region or the vasculature of the sclera have received significantly less attention in the literature. Consequently, ocular modalities beyond the iris are not well studied and their characteristics are today still not as well understood. While recent needs for more secure authentication schemes have considerably increased the interest in competing ocular modalities, progress in these areas is still held back by the lack of publicly available datasets that would allow for more targeted research into specific ocular characteristics next to the iris. In this paper, we aim to bridge this gap for the case of sclera biometrics and introduce a novel dataset designed for research into ocular biometrics and most importantly for research into the vasculature of the sclera. Our dataset, called Sclera Blood Vessels, Periocular and Iris (SBVPI), is, to the best of our knowledge, the first publicly available dataset designed specifically with research in sclera biometrics in mind. The dataset contains high-quality RGB ocular images, captured in the visible spectrum, belonging to 55 subjects. Unlike competing datasets, it comes with manual markups of various eye regions, such as the iris, pupil, canthus or eyelashes and a detailed pixel-wise annotation of the complete sclera vasculature for a subset of the images. Additionally, the datasets ship with gender and age labels. The unique characteristics of the dataset allow us to study aspects of sclera biometrics technology that have not been studied before in the literature (e.g. vasculature segmentation techniques) as well as issues that are of key importance for practical recognition systems. Thus, next to the SBVPI dataset we also present in this paper a comprehensive investigation into sclera biometrics and the main covariates that affect the performance of sclera segmentation and recognition techniques, such as gender, age, gaze direction or image resolution. Our experiments not only demonstrate the usefulness of the newly introduced dataset, but also contribute to a better understanding of sclera biometrics in general.

Close

Rot, Peter; Vitek, Matej; Grm, Klemen; Emeršič, Žiga; Peer, Peter; Štruc, Vitomir

Deep Sclera Segmentation and Recognition Book Section

In: Uhl, Andreas; Busch, Christoph; Marcel, Sebastien; Veldhuis, Rainer (Ed.): Handbook of Vascular Biometrics, pp. 395-432, Springer, 2019, ISBN: 978-3-030-27731-4.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, ocular, sclera, segmentation, vasculature

@incollection{ScleraNetChapter,

title = {Deep Sclera Segmentation and Recognition},

author = {Peter Rot and Matej Vitek and Klemen Grm and Žiga Emeršič and Peter Peer

and Vitomir Štruc},

editor = {Andreas Uhl and Christoph Busch and Sebastien Marcel and Rainer Veldhuis},

url = {https://link.springer.com/content/pdf/10.1007%2F978-3-030-27731-4_13.pdf},

doi = {https://doi.org/10.1007/978-3-030-27731-4_13},

isbn = {978-3-030-27731-4},

year  = {2019},

date = {2019-11-14},

booktitle = {Handbook of Vascular Biometrics},

pages = {395-432},

publisher = {Springer},

chapter = {13},

series = {Advances in Computer Vision and Pattern Recognition},

abstract = {In this chapter, we address the problem of biometric identity recognition from the vasculature of the human sclera. Specifically, we focus on the challenging task of multi-view sclera recognition, where the visible part of the sclera vasculature changes from image to image due to varying gaze (or view) directions. We propose a complete solution for this task built around Convolutional Neural Networks (CNNs) and make several contributions that result in state-of-the-art recognition performance, i.e.: (i) we develop a cascaded CNN assembly that is able to robustly segment the sclera vasculature from the input images regardless of gaze direction, and (ii) we present ScleraNET, a CNN model trained in a multi-task manner (combining losses pertaining to identity and view-direction recognition) that allows for the extraction of discriminative vasculature descriptors that can be used for identity inference. To evaluate the proposed contributions, we also introduce a new dataset of ocular images, called the Sclera Blood Vessels, Periocular and Iris (SBVPI) dataset, which represents one of the few publicly available datasets suitable for research in multi-view sclera segmentation and recognition. The datasets come with a rich set of annotations, such as a per-pixel markup of various eye parts (including the sclera vasculature), identity, gaze-direction and gender labels. We conduct rigorous experiments on SBVPI with competing techniques from the literature and show that the combination of the proposed segmentation and descriptor-computation models results in highly competitive recognition performance.},

keywords = {biometrics, CNN, deep learning, ocular, sclera, segmentation, vasculature},

pubstate = {published},

tppubtype = {incollection}

}

Close

Oblak, Tim; Grm, Klemen; Jaklič, Aleš; Peer, Peter; Štruc, Vitomir; Solina, Franc

Recovery of Superquadrics from Range Images using Deep Learning: A Preliminary Study Proceedings Article

In: 2019 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 45-52, IEEE, 2019.

Abstract | Links | BibTeX | Tags: CNN, convolutional neural networks, superquadrics, volumetric data

@inproceedings{oblak2019recovery,

title = {Recovery of Superquadrics from Range Images using Deep Learning: A Preliminary Study},

author = {Tim Oblak and Klemen Grm and Aleš Jaklič and Peter Peer and Vitomir Štruc and Franc Solina},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/Superkvadriki_draft.pdf},

year  = {2019},

date = {2019-06-01},

booktitle = {2019 IEEE International Work Conference on Bioinspired Intelligence (IWOBI)},

journal = {arXiv preprint arXiv:1904.06585},

pages = {45-52},

publisher = {IEEE},

abstract = {It has been a longstanding goal in computer vision to describe the 3D physical space in terms of parameterized volumetric models that would allow autonomous machines to understand and interact with their surroundings. Such models are typically motivated by human visual perception and aim to represents all elements of the physical word ranging from individual objects to complex scenes using a small set of parameters. One of the de facto standards to approach this problem are superquadrics - volumetric models that define various 3D shape primitives and can be fitted to actual 3D data (either in the form of point clouds or range images). However, existing solutions to superquadric recovery involve costly iterative fitting procedures, which limit the applicability of such techniques in practice. To alleviate this problem, we explore in this paper the possibility to recover superquadrics from range images without time consuming iterative parameter estimation techniques by using contemporary deep-learning models, more specifically, convolutional neural networks (CNNs). We pose the superquadric recovery problem as a regression task and develop a CNN regressor that is able to estimate the parameters of a superquadric model from a given range image. We train the regressor on a large set of synthetic range images, each containing a single (unrotated) superquadric shape and evaluate the learned model in comparative experiments with the current state-of-the-art. Additionally, we also present a qualitative analysis involving a dataset of real-world objects. The results of our experiments show that the proposed regressor not only outperforms the existing state-of-the-art, but also ensures a 270x faster  execution time.},

keywords = {CNN, convolutional neural networks, superquadrics, volumetric data},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Lozej, Juš; Meden, Blaž; Struc, Vitomir; Peer, Peter

End-to-end iris segmentation using U-Net Proceedings Article

In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–6, IEEE 2018.

Abstract | Links | BibTeX | Tags: biometrics, CNN, convolutional neural networks, iris, ocular, U-net

Emeršič, Žiga; Štepec, Dejan; Štruc, Vitomir; Peer, Peter

Training convolutional neural networks with limited training data for ear recognition in the wild Proceedings Article

In: IEEE International Conference on Automatic Face and Gesture Recognition, Workshop on Biometrics in the Wild 2017, 2017.

Abstract | Links | BibTeX | Tags: CNN, convolutional neural networks, ear, ear recognition, limited data, model learning

@inproceedings{emervsivc2017training,

title = {Training convolutional neural networks with limited training data for ear recognition in the wild},

author = {Žiga Emeršič and Dejan Štepec and Vitomir Štruc and Peter Peer},

url = {https://arxiv.org/pdf/1711.09952.pdf},

year  = {2017},

date = {2017-05-01},

booktitle = {IEEE International Conference on Automatic Face and Gesture Recognition, Workshop on Biometrics in the Wild 2017},

journal = {arXiv preprint arXiv:1711.09952},

abstract = {Identity recognition from ear images is an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes ear recognition technology an appealing choice for surveillance and security applications as well as related application domains. In contrast to other biometric modalities, where large datasets captured in uncontrolled settings are readily available, datasets of ear images are still limited in size and mostly of laboratory-like quality. As a consequence, ear recognition technology has not benefited yet from advances in deep learning and convolutional neural networks (CNNs) and is still lacking behind other modalities that experienced significant performance gains owing to deep recognition technology. In this paper we address this problem and aim at building a CNNbased ear recognition model. We explore different strategies towards model training with limited amounts of training data and show that by selecting an appropriate model architecture, using aggressive data augmentation and selective learning on existing (pre-trained) models, we are able to learn an effective CNN-based model using a little more than 1300 training images. The result of our work is the first CNN-based approach to ear recognition that is also made publicly available to the research community. With our model we are able to improve on the rank one recognition rate of the previous state-of-the-art by more than 25% on a challenging dataset of ear images captured from the web (a.k.a. in the wild).},

keywords = {CNN, convolutional neural networks, ear, ear recognition, limited data, model learning},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Grm, Klemen; Štruc, Vitomir; Artiges, Anais; Caron, Matthieu; Ekenel, Hazim K.

Strengths and weaknesses of deep learning models for face recognition against image degradations Journal Article

In: IET Biometrics, vol. 7, no. 1, pp. 81–89, 2017.

Abstract | Links | BibTeX | Tags: CNN, convolutional neural networks, face recognition, googlenet, study, vgg

Stržinar, Žiga; Grm, Klemen; Štruc, Vitomir

Učenje podobnosti v globokih nevronskih omrežjih za razpoznavanje obrazov Proceedings Article

In: Proceedings of the Electrotechnical and Computer Science Conference (ERK), Portorož, Slovenia, 2016.

Abstract | Links | BibTeX | Tags: biometrics, CNN, deep learning, difference space, face verification, LFW, performance evaluation

Grm, Klemen; Dobrišek, Simon; Štruc, Vitomir

Deep pair-wise similarity learning for face recognition Proceedings Article

In: 4th International Workshop on Biometrics and Forensics (IWBF), pp. 1–6, IEEE 2016.

Abstract | Links | BibTeX | Tags: CNN, deep learning, face recognition, IJB-A, IWBF, performance evaluation, similarity learning

@inproceedings{grm2016deep,

title = {Deep pair-wise similarity learning for face recognition},

author = {Klemen Grm and Simon Dobrišek and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/en/deeppair-wisesimilaritylearningforfacerecognition/},

year  = {2016},

date = {2016-01-01},

urldate = {2016-01-01},

booktitle = {4th International Workshop on Biometrics and Forensics (IWBF)},

pages = {1--6},

organization = {IEEE},

abstract = {Recent advances in deep learning made it possible to build deep hierarchical models capable of delivering state-of-the-art performance in various vision tasks, such as object recognition, detection or tracking. For recognition tasks the most common approach when using deep models is to learn object representations (or features) directly from raw image-input and then feed the learned features to a suitable classifier. Deep models used in this pipeline are typically heavily parameterized and require enormous amounts of training data to deliver competitive recognition performance. Despite the use of data augmentation techniques, many application domains, predefined experimental protocols or specifics of the recognition problem limit the amount of available training data and make training an effective deep hierarchical model a difficult task. In this paper, we present a novel, deep pair-wise similarity learning (DPSL) strategy for deep models, developed specifically to overcome the problem of insufficient training data, and demonstrate its usage on the task of face recognition. Unlike existing (deep) learning strategies, DPSL operates on image-pairs and tries to learn pair-wise image similarities that can be used for recognition purposes directly instead of feature representations that need to be fed to appropriate classification techniques, as with traditional deep learning pipelines. Since our DPSL strategy assumes an image pair as the input to the learning procedure, the amount of training data available to train deep models is quadratic in the number of available training images, which is of paramount importance for models with a large number of parameters. We demonstrate the efficacy of the proposed learning strategy by developing a deep model for pose-invariant face recognition, called Pose-Invariant Similarity Index (PISI), and presenting comparative experimental results on the FERET an IJB-A datasets.},

keywords = {CNN, deep learning, face recognition, IJB-A, IWBF, performance evaluation, similarity learning},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Grm, Klemen; Dobrišek, Simon; Štruc, Vitomir

The pose-invariant similarity index for face recognition Proceedings Article

In: Proceedings of the Electrotechnical and Computer Science Conference (ERK), Portorož, Slovenia, 2015.

BibTeX | Tags: biometrics, CNN, deep learning, deep models, face verification, similarity learning