2024
|
Fang, Meiling; Yang, Wufei; Kuijper, Arjan; S̆truc, Vitomir; Damer, Naser Fairness in Face Presentation Attack Detection Journal Article In: Pattern Recognition, vol. 147 , iss. 110002, pp. 1-14, 2024. @article{PR_Fairness2024,
title = {Fairness in Face Presentation Attack Detection},
author = {Meiling Fang and Wufei Yang and Arjan Kuijper and Vitomir S̆truc and Naser Damer},
url = {https://www.sciencedirect.com/science/article/pii/S0031320323007008?dgcid=coauthor},
year = {2024},
date = {2024-03-01},
urldate = {2024-03-01},
journal = {Pattern Recognition},
volume = {147 },
issue = {110002},
pages = {1-14},
abstract = {Face recognition (FR) algorithms have been proven to exhibit discriminatory behaviors against certain demographic and non-demographic groups, raising ethical and legal concerns regarding their deployment in real-world scenarios. Despite the growing number of fairness studies in FR, the fairness of face presentation attack detection (PAD) has been overlooked, mainly due to the lack of appropriately annotated data. To avoid and mitigate the potential negative impact of such behavior, it is essential to assess the fairness in face PAD and develop fair PAD models. To enable fairness analysis in face PAD, we present a Combined Attribute Annotated PAD Dataset (CAAD-PAD), offering seven human-annotated attribute labels. Then, we comprehensively analyze the fairness of PAD and its relation to the nature of the training data and the Operational Decision Threshold Assignment (ODTA) through a set of face PAD solutions. Additionally, we propose a novel metric, the Accuracy Balanced Fairness (ABF), that jointly represents both the PAD fairness and the absolute PAD performance. The experimental results pointed out that female and faces with occluding features (e.g. eyeglasses, beard, etc.) are relatively less protected than male and non-occlusion groups by all PAD solutions. To alleviate this observed unfairness, we propose a plug-and-play data augmentation method, FairSWAP, to disrupt the identity/semantic information and encourage models to mine the attack clues. The extensive experimental results indicate that FairSWAP leads to better-performing and fairer face PADs in 10 out of 12 investigated cases.},
keywords = {biometrics, computer vision, face analysis, face PAD, face recognition, fairness, pad, presentation attack detection},
pubstate = {published},
tppubtype = {article}
}
Face recognition (FR) algorithms have been proven to exhibit discriminatory behaviors against certain demographic and non-demographic groups, raising ethical and legal concerns regarding their deployment in real-world scenarios. Despite the growing number of fairness studies in FR, the fairness of face presentation attack detection (PAD) has been overlooked, mainly due to the lack of appropriately annotated data. To avoid and mitigate the potential negative impact of such behavior, it is essential to assess the fairness in face PAD and develop fair PAD models. To enable fairness analysis in face PAD, we present a Combined Attribute Annotated PAD Dataset (CAAD-PAD), offering seven human-annotated attribute labels. Then, we comprehensively analyze the fairness of PAD and its relation to the nature of the training data and the Operational Decision Threshold Assignment (ODTA) through a set of face PAD solutions. Additionally, we propose a novel metric, the Accuracy Balanced Fairness (ABF), that jointly represents both the PAD fairness and the absolute PAD performance. The experimental results pointed out that female and faces with occluding features (e.g. eyeglasses, beard, etc.) are relatively less protected than male and non-occlusion groups by all PAD solutions. To alleviate this observed unfairness, we propose a plug-and-play data augmentation method, FairSWAP, to disrupt the identity/semantic information and encourage models to mine the attack clues. The extensive experimental results indicate that FairSWAP leads to better-performing and fairer face PADs in 10 out of 12 investigated cases. |
2023
|
Pernuš, Martin; Štruc, Vitomir; Dobrišek, Simon MaskFaceGAN: High Resolution Face Editing With Masked GAN Latent Code Optimization Journal Article In: IEEE Transactions on Image Processing, 2023, ISSN: 1941-0042. @article{MaskFaceGAN,
title = {MaskFaceGAN: High Resolution Face Editing With Masked GAN Latent Code Optimization},
author = {Martin Pernuš and Vitomir Štruc and Simon Dobrišek},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10299582
https://lmi.fe.uni-lj.si/wp-content/uploads/2023/02/MaskFaceGAN_compressed.pdf
https://arxiv.org/pdf/2103.11135.pdf},
doi = {10.1109/TIP.2023.3326675},
issn = {1941-0042},
year = {2023},
date = {2023-10-27},
urldate = {2023-01-02},
journal = {IEEE Transactions on Image Processing},
abstract = {Face editing represents a popular research topic within the computer vision and image processing communities. While significant progress has been made recently in this area, existing solutions: ( i ) are still largely focused on low-resolution images, ( ii ) often generate editing results with visual artefacts, or ( iii ) lack fine-grained control over the editing procedure and alter multiple (entangled) attributes simultaneously, when trying to generate the desired facial semantics. In this paper, we aim to address these issues through a novel editing approach, called MaskFaceGAN that focuses on local attribute editing. The proposed approach is based on an optimization procedure that directly optimizes the latent code of a pre-trained (state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with respect to several constraints that ensure: ( i ) preservation of relevant image content, ( ii ) generation of the targeted facial attributes, and ( iii ) spatially–selective treatment of local image regions. The constraints are enforced with the help of an (differentiable) attribute classifier and face parser that provide the necessary reference information for the optimization procedure. MaskFaceGAN is evaluated in extensive experiments on the FRGC, SiblingsDB-HQf, and XM2VTS datasets and in comparison with several state-of-the-art techniques from the literature. Our experimental results show that the proposed approach is able to edit face images with respect to several local facial attributes with unprecedented image quality and at high-resolutions (1024×1024), while exhibiting considerably less problems with attribute entanglement than competing solutions. The source code is publicly available from: https://github.com/MartinPernus/MaskFaceGAN.},
keywords = {CNN, computer vision, deep learning, face editing, face image processing, GAN, GAN inversion, generative models, StyleGAN},
pubstate = {published},
tppubtype = {article}
}
Face editing represents a popular research topic within the computer vision and image processing communities. While significant progress has been made recently in this area, existing solutions: ( i ) are still largely focused on low-resolution images, ( ii ) often generate editing results with visual artefacts, or ( iii ) lack fine-grained control over the editing procedure and alter multiple (entangled) attributes simultaneously, when trying to generate the desired facial semantics. In this paper, we aim to address these issues through a novel editing approach, called MaskFaceGAN that focuses on local attribute editing. The proposed approach is based on an optimization procedure that directly optimizes the latent code of a pre-trained (state-of-the-art) Generative Adversarial Network (i.e., StyleGAN2) with respect to several constraints that ensure: ( i ) preservation of relevant image content, ( ii ) generation of the targeted facial attributes, and ( iii ) spatially–selective treatment of local image regions. The constraints are enforced with the help of an (differentiable) attribute classifier and face parser that provide the necessary reference information for the optimization procedure. MaskFaceGAN is evaluated in extensive experiments on the FRGC, SiblingsDB-HQf, and XM2VTS datasets and in comparison with several state-of-the-art techniques from the literature. Our experimental results show that the proposed approach is able to edit face images with respect to several local facial attributes with unprecedented image quality and at high-resolutions (1024×1024), while exhibiting considerably less problems with attribute entanglement than competing solutions. The source code is publicly available from: https://github.com/MartinPernus/MaskFaceGAN. |
Rot, Peter; Grm, Klemen; Peer, Peter; Štruc, Vitomir PrivacyProber: Assessment and Detection of Soft–Biometric Privacy–Enhancing Techniques Journal Article In: IEEE Transactions on Dependable and Secure Computing, pp. 1-18, 2023, ISBN: 1545-5971. @article{PrivacProberRot,
title = {PrivacyProber: Assessment and Detection of Soft–Biometric Privacy–Enhancing Techniques},
author = {Peter Rot and Klemen Grm and Peter Peer and Vitomir Štruc},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10264192},
doi = {10.1109/TDSC.2023.3319500},
isbn = {1545-5971},
year = {2023},
date = {2023-09-23},
journal = {IEEE Transactions on Dependable and Secure Computing},
pages = {1-18},
abstract = {Soft–biometric privacy–enhancing techniques represent machine learning methods that aim to: (i) mitigate privacy concerns associated with face recognition technology by suppressing selected soft–biometric attributes in facial images (e.g., gender, age, ethnicity) and (ii) make unsolicited extraction of sensitive personal information infeasible. Because such techniques are increasingly used in real–world applications, it is imperative to understand to what extent the privacy enhancement can be inverted and how much attribute information can be recovered from privacy–enhanced images. While these aspects are critical, they have not been investigated in the literature so far. In this paper, we, therefore, study the robustness of several state–of–the–art soft–biometric privacy–enhancing techniques to attribute recovery attempts. We propose PrivacyProber, a high–level framework for restoring soft–biometric information from privacy–enhanced facial images, and apply it for attribute recovery in comprehensive experiments on three public face datasets, i.e., LFW, MUCT and Adience. Our experiments show that the proposed framework is able to restore a considerable amount of suppressed information, regardless of the privacy–enhancing technique used (e.g., adversarial perturbations, conditional synthesis, etc.), but also that there are significant differences between the considered privacy models. These results point to the need for novel mechanisms that can improve the robustness of existing privacy–enhancing techniques and secure them against potential adversaries trying to restore suppressed information. Additionally, we demonstrate that PrivacyProber can also be used to detect privacy–enhancement in facial images (under black–box assumptions) with high accuracy. Specifically, we show that a detection procedure can be developed around the proposed framework that is learning free and, therefore, generalizes well across different data characteristics and privacy–enhancing techniques.},
keywords = {biometrics, face, privacy, privacy enhancement, privacy protection, privacy-enhancing techniques, soft biometric privacy},
pubstate = {published},
tppubtype = {article}
}
Soft–biometric privacy–enhancing techniques represent machine learning methods that aim to: (i) mitigate privacy concerns associated with face recognition technology by suppressing selected soft–biometric attributes in facial images (e.g., gender, age, ethnicity) and (ii) make unsolicited extraction of sensitive personal information infeasible. Because such techniques are increasingly used in real–world applications, it is imperative to understand to what extent the privacy enhancement can be inverted and how much attribute information can be recovered from privacy–enhanced images. While these aspects are critical, they have not been investigated in the literature so far. In this paper, we, therefore, study the robustness of several state–of–the–art soft–biometric privacy–enhancing techniques to attribute recovery attempts. We propose PrivacyProber, a high–level framework for restoring soft–biometric information from privacy–enhanced facial images, and apply it for attribute recovery in comprehensive experiments on three public face datasets, i.e., LFW, MUCT and Adience. Our experiments show that the proposed framework is able to restore a considerable amount of suppressed information, regardless of the privacy–enhancing technique used (e.g., adversarial perturbations, conditional synthesis, etc.), but also that there are significant differences between the considered privacy models. These results point to the need for novel mechanisms that can improve the robustness of existing privacy–enhancing techniques and secure them against potential adversaries trying to restore suppressed information. Additionally, we demonstrate that PrivacyProber can also be used to detect privacy–enhancement in facial images (under black–box assumptions) with high accuracy. Specifically, we show that a detection procedure can be developed around the proposed framework that is learning free and, therefore, generalizes well across different data characteristics and privacy–enhancing techniques. |
Babnik, Žiga; Peer, Peter; Štruc, Vitomir DifFIQA: Face Image Quality Assessment Using Denoising Diffusion Probabilistic Models Proceedings Article In: IEEE International Joint Conference on Biometrics , pp. 1-10, IEEE, Ljubljana, Slovenia, 2023. @inproceedings{Diffiqa_2023,
title = {DifFIQA: Face Image Quality Assessment Using Denoising Diffusion Probabilistic Models},
author = {Žiga Babnik and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/121.pdf
https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/121-supp.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics },
pages = {1-10},
publisher = {IEEE},
address = {Ljubljana, Slovenia},
abstract = {Modern face recognition (FR) models excel in constrained
scenarios, but often suffer from decreased performance
when deployed in unconstrained (real-world) environments
due to uncertainties surrounding the quality
of the captured facial data. Face image quality assessment
(FIQA) techniques aim to mitigate these performance
degradations by providing FR models with sample-quality
predictions that can be used to reject low-quality samples
and reduce false match errors. However, despite steady improvements,
ensuring reliable quality estimates across facial
images with diverse characteristics remains challenging.
In this paper, we present a powerful new FIQA approach,
named DifFIQA, which relies on denoising diffusion
probabilistic models (DDPM) and ensures highly competitive
results. The main idea behind the approach is to utilize
the forward and backward processes of DDPMs to perturb
facial images and quantify the impact of these perturbations
on the corresponding image embeddings for quality
prediction. Because the diffusion-based perturbations are
computationally expensive, we also distill the knowledge
encoded in DifFIQA into a regression-based quality predictor,
called DifFIQA(R), that balances performance and
execution time. We evaluate both models in comprehensive
experiments on 7 diverse datasets, with 4 target FR models
and against 10 state-of-the-art FIQA techniques with
highly encouraging results. The source code is available
from: https://github.com/LSIbabnikz/DifFIQA.},
keywords = {biometrics, deep learning, denoising diffusion probabilistic models, diffusion, face, face image quality assesment, face recognition, FIQA, quality},
pubstate = {published},
tppubtype = {inproceedings}
}
Modern face recognition (FR) models excel in constrained
scenarios, but often suffer from decreased performance
when deployed in unconstrained (real-world) environments
due to uncertainties surrounding the quality
of the captured facial data. Face image quality assessment
(FIQA) techniques aim to mitigate these performance
degradations by providing FR models with sample-quality
predictions that can be used to reject low-quality samples
and reduce false match errors. However, despite steady improvements,
ensuring reliable quality estimates across facial
images with diverse characteristics remains challenging.
In this paper, we present a powerful new FIQA approach,
named DifFIQA, which relies on denoising diffusion
probabilistic models (DDPM) and ensures highly competitive
results. The main idea behind the approach is to utilize
the forward and backward processes of DDPMs to perturb
facial images and quantify the impact of these perturbations
on the corresponding image embeddings for quality
prediction. Because the diffusion-based perturbations are
computationally expensive, we also distill the knowledge
encoded in DifFIQA into a regression-based quality predictor,
called DifFIQA(R), that balances performance and
execution time. We evaluate both models in comprehensive
experiments on 7 diverse datasets, with 4 target FR models
and against 10 state-of-the-art FIQA techniques with
highly encouraging results. The source code is available
from: https://github.com/LSIbabnikz/DifFIQA. |
Peng, Bo; Sun, Xianyun; Wang, Caiyong; Wang, Wei; Dong, Jing; Sun, Zhenan; Zhang, Rongyu; Cong, Heng; Fu, Lingzhi; Wang, Hao; Zhang, Yusheng; Zhang, HanYuan; Zhang, Xin; Liu, Boyuan; Ling, Hefei; Dragar, Luka; Batagelj, Borut; Peer, Peter; Struc, Vitomir; Zhou, Xinghui; Liu, Kunlin; Feng, Weitao; Zhang, Weiming; Wang, Haitao; Diao, Wenxiu DFGC-VRA: DeepFake Game Competition on Visual Realism Assessment Proceedings Article In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-9, Ljubljana, Slovenia, 2023. @inproceedings{Deepfake_comp2023,
title = {DFGC-VRA: DeepFake Game Competition on Visual Realism Assessment},
author = {Bo Peng and Xianyun Sun and Caiyong Wang and Wei Wang and Jing Dong and Zhenan Sun and Rongyu Zhang and Heng Cong and Lingzhi Fu and Hao Wang and Yusheng Zhang and HanYuan Zhang and Xin Zhang and Boyuan Liu and Hefei Ling and Luka Dragar and Borut Batagelj and Peter Peer and Vitomir Struc and Xinghui Zhou and Kunlin Liu and Weitao Feng and Weiming Zhang and Haitao Wang and Wenxiu Diao},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/CameraReady-225.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics (IJCB 2023)},
pages = {1-9},
address = {Ljubljana, Slovenia},
abstract = {This paper presents the summary report on the DeepFake
Game Competition on Visual Realism Assessment (DFGCVRA).
Deep-learning based face-swap videos, also known
as deepfakes, are becoming more and more realistic and
deceiving. The malicious usage of these face-swap videos
has caused wide concerns. There is a ongoing deepfake
game between its creators and detectors, with the human in
the loop. The research community has been focusing on
the automatic detection of these fake videos, but the assessment
of their visual realism, as perceived by human
eyes, is still an unexplored dimension. Visual realism assessment,
or VRA, is essential for assessing the potential
impact that may be brought by a specific face-swap video,
and it is also useful as a quality metric to compare different
face-swap methods. This is the third edition of DFGC
competitions, which focuses on the new visual realism assessment
topic, different from previous ones that compete
creators versus detectors. With this competition, we conduct
a comprehensive study of the SOTA performance on
the new task. We also release our MindSpore codes to fur-
*Jing Dong (jdong@nlpr.ia.ac.cn) is the corresponding author.
ther facilitate research in this field (https://github.
com/bomb2peng/DFGC-VRA-benckmark).},
keywords = {competition IJCB, deepfake detection, deepfakes, face, realism assessment},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents the summary report on the DeepFake
Game Competition on Visual Realism Assessment (DFGCVRA).
Deep-learning based face-swap videos, also known
as deepfakes, are becoming more and more realistic and
deceiving. The malicious usage of these face-swap videos
has caused wide concerns. There is a ongoing deepfake
game between its creators and detectors, with the human in
the loop. The research community has been focusing on
the automatic detection of these fake videos, but the assessment
of their visual realism, as perceived by human
eyes, is still an unexplored dimension. Visual realism assessment,
or VRA, is essential for assessing the potential
impact that may be brought by a specific face-swap video,
and it is also useful as a quality metric to compare different
face-swap methods. This is the third edition of DFGC
competitions, which focuses on the new visual realism assessment
topic, different from previous ones that compete
creators versus detectors. With this competition, we conduct
a comprehensive study of the SOTA performance on
the new task. We also release our MindSpore codes to fur-
*Jing Dong (jdong@nlpr.ia.ac.cn) is the corresponding author.
ther facilitate research in this field (https://github.
com/bomb2peng/DFGC-VRA-benckmark). |
Kolf, Jan Niklas; Boutros, Fadi; Elliesen, Jurek; Theuerkauf, Markus; Damer, Naser; Alansari, Mohamad Y; Hay, Oussama Abdul; Alansari, Sara Yousif; Javed, Sajid; Werghi, Naoufel; Grm, Klemen; Struc, Vitomir; Alonso-Fernandez, Fernando; Hernandez-Diaz, Kevin; Bigun, Josef; George, Anjith; Ecabert, Christophe; Shahreza, Hatef Otroshi; Kotwal, Ketan; Marcel, Sébastien; Medvedev, Iurii; Bo, Jin; Nunes, Diogo; Hassanpour, Ahmad; Khatiwada, Pankaj; Toor, Aafan Ahmad; Yang, Bian EFaR 2023: Efficient Face Recognition Competition Proceedings Article In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-12, Ljubljana, Slovenia, 2023. @inproceedings{EFAR2023_2023,
title = {EFaR 2023: Efficient Face Recognition Competition},
author = {Jan Niklas Kolf and Fadi Boutros and Jurek Elliesen and Markus Theuerkauf and Naser Damer and Mohamad Y Alansari and Oussama Abdul Hay and Sara Yousif Alansari and Sajid Javed and Naoufel Werghi and Klemen Grm and Vitomir Struc and Fernando Alonso-Fernandez and Kevin Hernandez-Diaz and Josef Bigun and Anjith George and Christophe Ecabert and Hatef Otroshi Shahreza and Ketan Kotwal and Sébastien Marcel and Iurii Medvedev and Jin Bo and Diogo Nunes and Ahmad Hassanpour and Pankaj Khatiwada and Aafan Ahmad Toor and Bian Yang},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/CameraReady-231.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics (IJCB 2023)},
pages = {1-12},
address = {Ljubljana, Slovenia},
abstract = {This paper presents the summary of the Efficient Face
Recognition Competition (EFaR) held at the 2023 International
Joint Conference on Biometrics (IJCB 2023). The
competition received 17 submissions from 6 different teams.
To drive further development of efficient face recognition
models, the submitted solutions are ranked based on a
weighted score of the achieved verification accuracies on a
diverse set of benchmarks, as well as the deployability given
by the number of floating-point operations and model size.
The evaluation of submissions is extended to bias, crossquality,
and large-scale recognition benchmarks. Overall,
the paper gives an overview of the achieved performance
values of the submitted solutions as well as a diverse set of
baselines. The submitted solutions use small, efficient network
architectures to reduce the computational cost, some
solutions apply model quantization. An outlook on possible
techniques that are underrepresented in current solutions is
given as well.},
keywords = {biometrics, deep learning, face, face recognition, lightweight models},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents the summary of the Efficient Face
Recognition Competition (EFaR) held at the 2023 International
Joint Conference on Biometrics (IJCB 2023). The
competition received 17 submissions from 6 different teams.
To drive further development of efficient face recognition
models, the submitted solutions are ranked based on a
weighted score of the achieved verification accuracies on a
diverse set of benchmarks, as well as the deployability given
by the number of floating-point operations and model size.
The evaluation of submissions is extended to bias, crossquality,
and large-scale recognition benchmarks. Overall,
the paper gives an overview of the achieved performance
values of the submitted solutions as well as a diverse set of
baselines. The submitted solutions use small, efficient network
architectures to reduce the computational cost, some
solutions apply model quantization. An outlook on possible
techniques that are underrepresented in current solutions is
given as well. |
Das, Abhijit; Atreya, Saurabh K; Mukherjee, Aritra; Vitek, Matej; Li, Haiqing; Wang, Caiyong; Guangzhe, Zhao; Boutros, Fadi; Siebke, Patrick; Kolf, Jan Niklas; Damer, Naser; Sun, Ye; Hexin, Lu; Aobo, Fab; Sheng, You; Nathan, Sabari; Ramamoorthy, Suganya; S, Rampriya R; G, Geetanjali; Sihag, Prinaka; Nigam, Aditya; Peer, Peter; Pal, Umapada; Struc, Vitomir Sclera Segmentation and Joint Recognition Benchmarking Competition: SSRBC 2023 Proceedings Article In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-10, Ljubljana, Slovenia, 2023. @inproceedings{SSBRC2023,
title = {Sclera Segmentation and Joint Recognition Benchmarking Competition: SSRBC 2023},
author = {Abhijit Das and Saurabh K Atreya and Aritra Mukherjee and Matej Vitek and Haiqing Li and Caiyong Wang and Zhao Guangzhe and Fadi Boutros and Patrick Siebke and Jan Niklas Kolf and Naser Damer and Ye Sun and Lu Hexin and Fab Aobo and You Sheng and Sabari Nathan and Suganya Ramamoorthy and Rampriya R S and Geetanjali G and Prinaka Sihag and Aditya Nigam and Peter Peer and Umapada Pal and Vitomir Struc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/CameraReady-233.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics (IJCB 2023)},
pages = {1-10},
address = {Ljubljana, Slovenia},
abstract = {This paper presents the summary of the Sclera Segmentation
and Joint Recognition Benchmarking Competition (SSRBC
2023) held in conjunction with IEEE International
Joint Conference on Biometrics (IJCB 2023). Different from
the previous editions of the competition, SSRBC 2023 not
only explored the performance of the latest and most advanced
sclera segmentation models, but also studied the impact
of segmentation quality on recognition performance.
Five groups took part in SSRBC 2023 and submitted a total
of six segmentation models and one recognition technique
for scoring. The submitted solutions included a wide
variety of conceptually diverse deep-learning models and
were rigorously tested on three publicly available datasets,
i.e., MASD, SBVPI and MOBIUS. Most of the segmentation
models achieved encouraging segmentation and recognition
performance. Most importantly, we observed that better
segmentation results always translate into better verification
performance.},
keywords = {biometrics, competition IJCB, computer vision, deep learning, sclera, sclera segmentation},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents the summary of the Sclera Segmentation
and Joint Recognition Benchmarking Competition (SSRBC
2023) held in conjunction with IEEE International
Joint Conference on Biometrics (IJCB 2023). Different from
the previous editions of the competition, SSRBC 2023 not
only explored the performance of the latest and most advanced
sclera segmentation models, but also studied the impact
of segmentation quality on recognition performance.
Five groups took part in SSRBC 2023 and submitted a total
of six segmentation models and one recognition technique
for scoring. The submitted solutions included a wide
variety of conceptually diverse deep-learning models and
were rigorously tested on three publicly available datasets,
i.e., MASD, SBVPI and MOBIUS. Most of the segmentation
models achieved encouraging segmentation and recognition
performance. Most importantly, we observed that better
segmentation results always translate into better verification
performance. |
Emersic, Ziga; Ohki, Tetsushi; Akasaka, Muku; Arakawa, Takahiko; Maeda, Soshi; Okano, Masora; Sato, Yuya; George, Anjith; Marcel, Sébastien; Ganapathi, Iyyakutti Iyappan; Ali, Syed Sadaf; Javed, Sajid; Werghi, Naoufel; Işık, Selin Gök; Sarıtaş, Erdi; Ekenel, Hazim Kemal; Hudovernik, Valter; Kolf, Jan Niklas; Boutros, Fadi; Damer, Naser; Sharma, Geetanjali; Kamboj, Aman; Nigam, Aditya; Jain, Deepak Kumar; Cámara, Guillermo; Peer, Peter; Struc, Vitomir The Unconstrained Ear Recognition Challenge 2023: Maximizing Performance and Minimizing Bias Proceedings Article In: IEEE International Joint Conference on Biometrics (IJCB 2023), pp. 1-10, Ljubljana, Slovenia, 2023. @inproceedings{UERC2023,
title = {The Unconstrained Ear Recognition Challenge 2023: Maximizing Performance and Minimizing Bias},
author = {Ziga Emersic and Tetsushi Ohki and Muku Akasaka and Takahiko Arakawa and Soshi Maeda and Masora Okano and Yuya Sato and Anjith George and Sébastien Marcel and Iyyakutti Iyappan Ganapathi and Syed Sadaf Ali and Sajid Javed and Naoufel Werghi and Selin Gök Işık and Erdi Sarıtaş and Hazim Kemal Ekenel and Valter Hudovernik and Jan Niklas Kolf and Fadi Boutros and Naser Damer and Geetanjali Sharma and Aman Kamboj and Aditya Nigam and Deepak Kumar Jain and Guillermo Cámara and Peter Peer and Vitomir Struc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/09/CameraReady-234.pdf},
year = {2023},
date = {2023-09-01},
booktitle = {IEEE International Joint Conference on Biometrics (IJCB 2023)},
pages = {1-10},
address = {Ljubljana, Slovenia},
abstract = {The paper provides a summary of the 2023 Unconstrained
Ear Recognition Challenge (UERC), a benchmarking
effort focused on ear recognition from images acquired
in uncontrolled environments. The objective of the challenge
was to evaluate the effectiveness of current ear recognition
techniques on a challenging ear dataset while analyzing
the techniques from two distinct aspects, i.e., verification
performance and bias with respect to specific demographic
factors, i.e., gender and ethnicity. Seven research
groups participated in the challenge and submitted
a seven distinct recognition approaches that ranged from
descriptor-based methods and deep-learning models to ensemble
techniques that relied on multiple data representations
to maximize performance and minimize bias. A comprehensive
investigation into the performance of the submitted
models is presented, as well as an in-depth analysis of
bias and associated performance differentials due to differences
in gender and ethnicity. The results of the challenge
suggest that a wide variety of models (e.g., transformers,
convolutional neural networks, ensemble models) is capable
of achieving competitive recognition results, but also
that all of the models still exhibit considerable performance
differentials with respect to both gender and ethnicity. To
promote further development of unbiased and effective ear
recognition models, the starter kit of UERC 2023 together
with the baseline model, and training and test data is made
available from: http://ears.fri.uni-lj.si/.},
keywords = {biometrics, competition, computer vision, deep learning, ear, ear biometrics, UERC 2023},
pubstate = {published},
tppubtype = {inproceedings}
}
The paper provides a summary of the 2023 Unconstrained
Ear Recognition Challenge (UERC), a benchmarking
effort focused on ear recognition from images acquired
in uncontrolled environments. The objective of the challenge
was to evaluate the effectiveness of current ear recognition
techniques on a challenging ear dataset while analyzing
the techniques from two distinct aspects, i.e., verification
performance and bias with respect to specific demographic
factors, i.e., gender and ethnicity. Seven research
groups participated in the challenge and submitted
a seven distinct recognition approaches that ranged from
descriptor-based methods and deep-learning models to ensemble
techniques that relied on multiple data representations
to maximize performance and minimize bias. A comprehensive
investigation into the performance of the submitted
models is presented, as well as an in-depth analysis of
bias and associated performance differentials due to differences
in gender and ethnicity. The results of the challenge
suggest that a wide variety of models (e.g., transformers,
convolutional neural networks, ensemble models) is capable
of achieving competitive recognition results, but also
that all of the models still exhibit considerable performance
differentials with respect to both gender and ethnicity. To
promote further development of unbiased and effective ear
recognition models, the starter kit of UERC 2023 together
with the baseline model, and training and test data is made
available from: http://ears.fri.uni-lj.si/. |
Ivanovska, Marija; Štruc, Vitomir; Perš, Janez TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models Proceedings Article In: 18th International Conference on Machine Vision and Applications (MVA 2023), pp. 1-6, 2023. @inproceedings{MarijaTomato2023,
title = {TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models},
author = {Marija Ivanovska and Vitomir Štruc and Janez Perš },
url = {https://arxiv.org/pdf/2307.01064.pdf
https://ieeexplore.ieee.org/document/10215774},
doi = {10.23919/MVA57639.2023.10215774},
year = {2023},
date = {2023-07-23},
urldate = {2023-07-23},
booktitle = {18th International Conference on Machine Vision and Applications (MVA 2023)},
pages = {1-6},
abstract = {Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates
state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github. com/MIvanovska/TomatoDIFF},
keywords = {agriculture, dataset, deep learning, diffusion, plan segmentation, plant monitoring, robotics, segmentation, tomato dataset},
pubstate = {published},
tppubtype = {inproceedings}
}
Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates
state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github. com/MIvanovska/TomatoDIFF |
Vitek, Matej; Bizjak, Matic; Peer, Peter; Štruc, Vitomir IPAD: Iterative Pruning with Activation Deviation for Sclera Biometrics Journal Article In: Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 8, pp. 1-21, 2023. @article{VitekSaud2023,
title = {IPAD: Iterative Pruning with Activation Deviation for Sclera Biometrics},
author = {Matej Vitek and Matic Bizjak and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/07/PublishedVersion.pdf},
doi = {https://doi.org/10.1016/j.jksuci.2023.101630},
year = {2023},
date = {2023-07-10},
journal = {Journal of King Saud University - Computer and Information Sciences},
volume = {35},
number = {8},
pages = {1-21},
abstract = {The sclera has recently been gaining attention as a biometric modality due to its various desirable characteristics. A key step in any type of ocular biometric recognition, including sclera recognition, is the segmentation of the relevant part(s) of the eye. However, the high computational complexity of the (deep) segmentation models used in this task can limit their applicability on resource-constrained devices such as smartphones or head-mounted displays. As these devices are a common desired target for such biometric systems, lightweight solutions for ocular segmentation are critically needed. To address this issue, this paper introduces IPAD (Iterative Pruning with Activation Deviation), a novel method for developing lightweight convolutional networks, that is based on model pruning. IPAD uses a novel filter-activation-based criterion (ADC) to determine low-importance filters and employs an iterative model pruning procedure to derive the final lightweight model. To evaluate the proposed pruning procedure, we conduct extensive experiments with two diverse segmentation models, over four publicly available datasets (SBVPI, SLD, SMD and MOBIUS), in four distinct problem configurations and in comparison to state-of-the-art methods from the literature. The results of the experiments show that the proposed filter-importance criterion outperforms the standard L1 and L2 approaches from the literature. Furthermore, the results also suggest that: 1) the pruned models are able to retain (or even improve on) the performance of the unpruned originals, as long as they are not over-pruned, with RITnet and U-Net at 50% of their original FLOPs reaching up to 4% and 7% higher IoU values than their unpruned versions, respectively, 2) smaller models require more careful pruning, as the pruning process can hurt the model’s generalization capabilities, and 3) the novel criterion most convincingly outperforms the classic approaches when sufficient training data is available, implying that the abundance of data leads to more robust activation-based importance computation.},
keywords = {biometrics, CNN, deep learning, model compression, pruning, sclera, sclera segmentation},
pubstate = {published},
tppubtype = {article}
}
The sclera has recently been gaining attention as a biometric modality due to its various desirable characteristics. A key step in any type of ocular biometric recognition, including sclera recognition, is the segmentation of the relevant part(s) of the eye. However, the high computational complexity of the (deep) segmentation models used in this task can limit their applicability on resource-constrained devices such as smartphones or head-mounted displays. As these devices are a common desired target for such biometric systems, lightweight solutions for ocular segmentation are critically needed. To address this issue, this paper introduces IPAD (Iterative Pruning with Activation Deviation), a novel method for developing lightweight convolutional networks, that is based on model pruning. IPAD uses a novel filter-activation-based criterion (ADC) to determine low-importance filters and employs an iterative model pruning procedure to derive the final lightweight model. To evaluate the proposed pruning procedure, we conduct extensive experiments with two diverse segmentation models, over four publicly available datasets (SBVPI, SLD, SMD and MOBIUS), in four distinct problem configurations and in comparison to state-of-the-art methods from the literature. The results of the experiments show that the proposed filter-importance criterion outperforms the standard L1 and L2 approaches from the literature. Furthermore, the results also suggest that: 1) the pruned models are able to retain (or even improve on) the performance of the unpruned originals, as long as they are not over-pruned, with RITnet and U-Net at 50% of their original FLOPs reaching up to 4% and 7% higher IoU values than their unpruned versions, respectively, 2) smaller models require more careful pruning, as the pruning process can hurt the model’s generalization capabilities, and 3) the novel criterion most convincingly outperforms the classic approaches when sufficient training data is available, implying that the abundance of data leads to more robust activation-based importance computation. |
Plesh, Richard; Peer, Peter; Štruc, Vitomir GlassesGAN: Eyewear Personalization using Synthetic Appearance Discovery and Targeted Subspace Modeling Proceedings Article In: Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR) , 2023. @inproceedings{PleshCVPR2023,
title = {GlassesGAN: Eyewear Personalization using Synthetic Appearance Discovery and Targeted Subspace Modeling},
author = {Richard Plesh and Peter Peer and Vitomir Štruc},
url = {https://arxiv.org/pdf/2210.14145.pdf
https://openaccess.thecvf.com/content/CVPR2023/html/Plesh_GlassesGAN_Eyewear_Personalization_Using_Synthetic_Appearance_Discovery_and_Targeted_Subspace_CVPR_2023_paper.html},
year = {2023},
date = {2023-06-18},
urldate = {2023-06-18},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR) },
abstract = {We present GlassesGAN, a novel image editing framework for custom design of glasses, that sets a new standard in terms of image quality, edit realism, and continuous multi-style edit capability. To facilitate the editing process with GlassesGAN, we propose a Targeted Subspace Modelling (TSM) procedure that, based on a novel mechanism for (synthetic) appearance discovery in the latent space of a pre-trained GAN generator, constructs an eyeglasses-specific (latent) subspace that the editing framework can utilize. Additionally, we also introduce an appearance-constrained subspace initialization (SI) technique that centers the latent representation of the given input image in the well-defined part of the constructed subspace to improve the reliability of the learned edits. We test GlassesGAN on two (diverse) high-resolution datasets (CelebA-HQ and SiblingsDB-HQf) and compare it to three state-of-the-art competitors, i.e., InterfaceGAN, GANSpace, and MaskGAN. The reported results show that GlassesGAN convincingly outperforms all competing techniques, while offering additional functionality (e.g., fine-grained multi-style editing) not available with any of the competitors. The source code will be made freely available.},
keywords = {eyewear, eyewear personalization, face editing, GAN inversion, latent space editing, StyleGAN2, synthetic appearance discovery, targeted subspace modeling, virtual try-on},
pubstate = {published},
tppubtype = {inproceedings}
}
We present GlassesGAN, a novel image editing framework for custom design of glasses, that sets a new standard in terms of image quality, edit realism, and continuous multi-style edit capability. To facilitate the editing process with GlassesGAN, we propose a Targeted Subspace Modelling (TSM) procedure that, based on a novel mechanism for (synthetic) appearance discovery in the latent space of a pre-trained GAN generator, constructs an eyeglasses-specific (latent) subspace that the editing framework can utilize. Additionally, we also introduce an appearance-constrained subspace initialization (SI) technique that centers the latent representation of the given input image in the well-defined part of the constructed subspace to improve the reliability of the learned edits. We test GlassesGAN on two (diverse) high-resolution datasets (CelebA-HQ and SiblingsDB-HQf) and compare it to three state-of-the-art competitors, i.e., InterfaceGAN, GANSpace, and MaskGAN. The reported results show that GlassesGAN convincingly outperforms all competing techniques, while offering additional functionality (e.g., fine-grained multi-style editing) not available with any of the competitors. The source code will be made freely available. |
Pernuš, Martin; Bhatnagar, Mansi; Samad, Badr; Singh, Divyanshu; Peer, Peter; Štruc, Vitomir; Dobrišek, Simon ChildNet: Structural Kinship Face Synthesis Model With Appearance Control Mechanisms Journal Article In: IEEE Access, pp. 1-22, 2023, ISSN: 2169-3536. @article{AccessMartin2023,
title = {ChildNet: Structural Kinship Face Synthesis Model With Appearance Control Mechanisms},
author = {Martin Pernuš and Mansi Bhatnagar and Badr Samad and Divyanshu Singh and Peter Peer and Vitomir Štruc and Simon Dobrišek},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10126110},
doi = {10.1109/ACCESS.2023.3276877},
issn = {2169-3536},
year = {2023},
date = {2023-05-17},
journal = {IEEE Access},
pages = {1-22},
abstract = {Kinship face synthesis is an increasingly popular topic within the computer vision community, particularly the task of predicting the child appearance using parental images. Previous work has been limited in terms of model capacity and inadequate training data, which is comprised of low-resolution and tightly cropped images, leading to lower synthesis quality. In this paper, we propose ChildNet, a method for kinship face synthesis that leverages the facial image generation capabilities of a state-of-the-art Generative Adversarial Network (GAN), and resolves the aforementioned problems. ChildNet is designed within the GAN latent space and is able to predict a child appearance that bears high resemblance to real parents’ children. To ensure fine-grained control, we propose an age and gender manipulation module that allows precise manipulation of the child synthesis result. ChildNet is capable of generating multiple child images per parent pair input, while providing a way to control the image generation variability. Additionally, we introduce a mechanism to control the dominant parent image. Finally, to facilitate the task of kinship face synthesis, we introduce a new kinship dataset, called Next of Kin. This dataset contains 3690 high-resolution face images with a diverse range of ethnicities and ages. We evaluate ChildNet in comprehensive experiments against three competing kinship face synthesis models, using two kinship datasets. The experiments demonstrate the superior performance of ChildNet in terms of identity similarity, while exhibiting high perceptual image quality. The source code for the model is publicly available at: https://github.com/MartinPernus/ChildNet.},
keywords = {artificial intelligence, CNN, deep learning, face generation, face synthesis, GAN, GAN inversion, kinship, kinship synthesis, StyleGAN2},
pubstate = {published},
tppubtype = {article}
}
Kinship face synthesis is an increasingly popular topic within the computer vision community, particularly the task of predicting the child appearance using parental images. Previous work has been limited in terms of model capacity and inadequate training data, which is comprised of low-resolution and tightly cropped images, leading to lower synthesis quality. In this paper, we propose ChildNet, a method for kinship face synthesis that leverages the facial image generation capabilities of a state-of-the-art Generative Adversarial Network (GAN), and resolves the aforementioned problems. ChildNet is designed within the GAN latent space and is able to predict a child appearance that bears high resemblance to real parents’ children. To ensure fine-grained control, we propose an age and gender manipulation module that allows precise manipulation of the child synthesis result. ChildNet is capable of generating multiple child images per parent pair input, while providing a way to control the image generation variability. Additionally, we introduce a mechanism to control the dominant parent image. Finally, to facilitate the task of kinship face synthesis, we introduce a new kinship dataset, called Next of Kin. This dataset contains 3690 high-resolution face images with a diverse range of ethnicities and ages. We evaluate ChildNet in comprehensive experiments against three competing kinship face synthesis models, using two kinship datasets. The experiments demonstrate the superior performance of ChildNet in terms of identity similarity, while exhibiting high perceptual image quality. The source code for the model is publicly available at: https://github.com/MartinPernus/ChildNet. |
Boutros, Fadi; Štruc, Vitomir; Fierrez, Julian; Damer, Naser Synthetic data for face recognition: Current state and future prospects Journal Article In: Image and Vision Computing, no. 104688, 2023. @article{FadiIVCSynthetic,
title = {Synthetic data for face recognition: Current state and future prospects},
author = {Fadi Boutros and Vitomir Štruc and Julian Fierrez and Naser Damer},
url = {https://www.sciencedirect.com/science/article/pii/S0262885623000628},
doi = {https://doi.org/10.1016/j.imavis.2023.104688},
year = {2023},
date = {2023-05-15},
urldate = {2023-05-15},
journal = {Image and Vision Computing},
number = {104688},
abstract = {Over the past years, deep learning capabilities and the availability of large-scale training datasets advanced rapidly, leading to breakthroughs in face recognition accuracy. However, these technologies are foreseen to face a major challenge in the next years due to the legal and ethical concerns about using authentic biometric data in AI model training and evaluation along with increasingly utilizing data-hungry state-of-the-art deep learning models. With the recent advances in deep generative models and their success in generating realistic and high-resolution synthetic image data, privacy-friendly synthetic data has been recently proposed as an alternative to privacy-sensitive authentic data to overcome the challenges of using authentic data in face recognition development. This work aims at providing a clear and structured picture of the use-cases taxonomy of synthetic face data in face recognition along with the recent emerging advances of face recognition models developed on the bases of synthetic data. We also discuss the challenges facing the use of synthetic data in face recognition development and several future prospects of synthetic data in the domain of face recognition.},
keywords = {biometrics, CNN, diffusion, face recognition, generative models, survey, synthetic data},
pubstate = {published},
tppubtype = {article}
}
Over the past years, deep learning capabilities and the availability of large-scale training datasets advanced rapidly, leading to breakthroughs in face recognition accuracy. However, these technologies are foreseen to face a major challenge in the next years due to the legal and ethical concerns about using authentic biometric data in AI model training and evaluation along with increasingly utilizing data-hungry state-of-the-art deep learning models. With the recent advances in deep generative models and their success in generating realistic and high-resolution synthetic image data, privacy-friendly synthetic data has been recently proposed as an alternative to privacy-sensitive authentic data to overcome the challenges of using authentic data in face recognition development. This work aims at providing a clear and structured picture of the use-cases taxonomy of synthetic face data in face recognition along with the recent emerging advances of face recognition models developed on the bases of synthetic data. We also discuss the challenges facing the use of synthetic data in face recognition development and several future prospects of synthetic data in the domain of face recognition. |
Grabner, Miha; Wang, Yi; Wen, Qingsong; Blažič, Boštjan; Štruc, Vitomir A global modeling framework for load forecasting in distribution networks Journal Article In: IEEE Transactions on Smart Grid, 2023, ISSN: 1949-3061. @article{Grabner_TSG,
title = {A global modeling framework for load forecasting in distribution networks},
author = {Miha Grabner and Yi Wang and Qingsong Wen and Boštjan Blažič and Vitomir Štruc},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10092804},
doi = {10.1109/TSG.2023.3264525},
issn = {1949-3061},
year = {2023},
date = {2023-04-05},
journal = {IEEE Transactions on Smart Grid},
abstract = {With the increasing numbers of smart meter installations, scalable and efficient load forecasting techniques are critically needed to ensure sustainable situation awareness within the distribution networks. Distribution networks include a large amount of different loads at various aggregation levels, such as individual consumers, low-voltage feeders, and transformer stations. It is impractical to develop individual (or so-called local) forecasting models for each load separately. Additionally, such local models also (i) (largely) ignore the strong dependencies between different loads that might be present due to their spatial proximity and the characteristics of the distribution network, (ii) require historical data for each load to be able to make forecasts, and (iii) are incapable of adjusting to changes in the load behavior without retraining. To address these issues, we propose a global modeling framework for load forecasting in distribution networks that, unlike its local competitors, relies on a single global model to generate forecasts for a large number of loads. The global nature of the framework, significantly reduces the computational burden typically required when training multiple local forecasting models, efficiently exploits the cross-series information shared among different loads, and facilitates forecasts even when historical data for a load is missing or the behavior of a load evolves over time. To further improve on the performance of the proposed framework, an unsupervised localization mechanism and optimal ensemble construction strategy are also proposed to localize/personalize the global forecasting model to different load characteristics. Our experimental results show that the proposed framework outperforms naive benchmarks by more than 25% (in terms of Mean Absolute Error) on real-world dataset while exhibiting highly desirable characteristics when compared to the local models that are predominantly used in the literature. All source code and data are made publicly available to enable reproducibility: https://github.com/mihagrabner/GlobalModelingFramework},
keywords = {deep learning, global modeling, load forecasting, prediction, smart grid, time series analysis, time series forecasting},
pubstate = {published},
tppubtype = {article}
}
With the increasing numbers of smart meter installations, scalable and efficient load forecasting techniques are critically needed to ensure sustainable situation awareness within the distribution networks. Distribution networks include a large amount of different loads at various aggregation levels, such as individual consumers, low-voltage feeders, and transformer stations. It is impractical to develop individual (or so-called local) forecasting models for each load separately. Additionally, such local models also (i) (largely) ignore the strong dependencies between different loads that might be present due to their spatial proximity and the characteristics of the distribution network, (ii) require historical data for each load to be able to make forecasts, and (iii) are incapable of adjusting to changes in the load behavior without retraining. To address these issues, we propose a global modeling framework for load forecasting in distribution networks that, unlike its local competitors, relies on a single global model to generate forecasts for a large number of loads. The global nature of the framework, significantly reduces the computational burden typically required when training multiple local forecasting models, efficiently exploits the cross-series information shared among different loads, and facilitates forecasts even when historical data for a load is missing or the behavior of a load evolves over time. To further improve on the performance of the proposed framework, an unsupervised localization mechanism and optimal ensemble construction strategy are also proposed to localize/personalize the global forecasting model to different load characteristics. Our experimental results show that the proposed framework outperforms naive benchmarks by more than 25% (in terms of Mean Absolute Error) on real-world dataset while exhibiting highly desirable characteristics when compared to the local models that are predominantly used in the literature. All source code and data are made publicly available to enable reproducibility: https://github.com/mihagrabner/GlobalModelingFramework |
Meden, Blaž; Gonzalez-Hernandez, Manfred; Peer, Peter; Štruc, Vitomir Face deidentification with controllable privacy protection Journal Article In: Image and Vision Computing, vol. 134, no. 104678, pp. 1-19, 2023. @article{MedenDeID2023,
title = {Face deidentification with controllable privacy protection},
author = {Blaž Meden and Manfred Gonzalez-Hernandez and Peter Peer and Vitomir Štruc},
url = {https://reader.elsevier.com/reader/sd/pii/S0262885623000525?token=BC1E21411C50118E666720B002A89C9EB3DB4CFEEB5EB18D7BD7B0613085030A96621C8364583BFE7BAE025BE3646096&originRegion=eu-west-1&originCreation=20230516115322},
doi = {https://doi.org/10.1016/j.imavis.2023.104678},
year = {2023},
date = {2023-04-01},
journal = {Image and Vision Computing},
volume = {134},
number = {104678},
pages = {1-19},
abstract = {Privacy protection has become a crucial concern in today’s digital age. Particularly sensitive here are facial images, which typically not only reveal a person’s identity, but also other sensitive personal information. To address this problem, various face deidentification techniques have been presented in the literature. These techniques try to remove or obscure personal information from facial images while still preserving their usefulness for further analysis. While a considerable amount of work has been proposed on face deidentification, most state-of-theart solutions still suffer from various drawbacks, and (a) deidentify only a narrow facial area, leaving potentially important contextual information unprotected, (b) modify facial images to such degrees, that image naturalness and facial diversity is suffering in the deidentify images, (c) offer no flexibility in the level of privacy protection ensured, leading to suboptimal deployment in various applications, and (d) often offer an unsatisfactory tradeoff between the ability to obscure identity information, quality and naturalness of the deidentified images, and sufficient utility preservation. In this paper, we address these shortcomings with a novel controllable face deidentification technique that balances image quality, identity protection, and data utility for further analysis. The proposed approach utilizes a powerful generative model (StyleGAN2), multiple auxiliary classification models, and carefully designed constraints to guide the deidentification process. The approach is validated across four diverse datasets (CelebA-HQ, RaFD, XM2VTS, AffectNet) and in comparison to 7 state-of-the-art competitors. The results of the experiments demonstrate that the proposed solution leads to: (a) a considerable level of identity protection, (b) valuable preservation of data utility, (c) sufficient diversity among the deidentified faces, and (d) encouraging overall performance.},
keywords = {CNN, deep learning, deidentification, face recognition, GAN, GAN inversion, privacy, privacy protection, StyleGAN2},
pubstate = {published},
tppubtype = {article}
}
Privacy protection has become a crucial concern in today’s digital age. Particularly sensitive here are facial images, which typically not only reveal a person’s identity, but also other sensitive personal information. To address this problem, various face deidentification techniques have been presented in the literature. These techniques try to remove or obscure personal information from facial images while still preserving their usefulness for further analysis. While a considerable amount of work has been proposed on face deidentification, most state-of-theart solutions still suffer from various drawbacks, and (a) deidentify only a narrow facial area, leaving potentially important contextual information unprotected, (b) modify facial images to such degrees, that image naturalness and facial diversity is suffering in the deidentify images, (c) offer no flexibility in the level of privacy protection ensured, leading to suboptimal deployment in various applications, and (d) often offer an unsatisfactory tradeoff between the ability to obscure identity information, quality and naturalness of the deidentified images, and sufficient utility preservation. In this paper, we address these shortcomings with a novel controllable face deidentification technique that balances image quality, identity protection, and data utility for further analysis. The proposed approach utilizes a powerful generative model (StyleGAN2), multiple auxiliary classification models, and carefully designed constraints to guide the deidentification process. The approach is validated across four diverse datasets (CelebA-HQ, RaFD, XM2VTS, AffectNet) and in comparison to 7 state-of-the-art competitors. The results of the experiments demonstrate that the proposed solution leads to: (a) a considerable level of identity protection, (b) valuable preservation of data utility, (c) sufficient diversity among the deidentified faces, and (d) encouraging overall performance. |
Ivanovska, Marija; Štruc, Vitomir Face Morphing Attack Detection with Denoising Diffusion Probabilistic Models Proceedings Article In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), pp. 1-6, 2023. @inproceedings{IWBF2023_Marija,
title = {Face Morphing Attack Detection with Denoising Diffusion Probabilistic Models},
author = {Marija Ivanovska and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/03/IWBF2023_Morphing.pdf},
year = {2023},
date = {2023-02-28},
booktitle = {Proceedings of the International Workshop on Biometrics and Forensics (IWBF)},
pages = {1-6},
abstract = {Morphed face images have recently become a growing concern for existing face verification systems, as they are relatively easy to generate and can be used to impersonate someone's identity for various malicious purposes. Efficient Morphing Attack Detection (MAD) that generalizes well across different morphing techniques is, therefore, of paramount importance. Existing MAD techniques predominantly rely on discriminative models that learn from examples of bona fide and morphed images and, as a result, often exhibit sub-optimal generalization performance when confronted with unknown types of morphing attacks. To address this problem, we propose a novel, diffusion--based MAD method in this paper that learns only from the characteristics of bona fide images. Various forms of morphing attacks are then detected by our model as out-of-distribution samples. We perform rigorous experiments over four different datasets (CASIA-WebFace, FRLL-Morphs, FERET-Morphs and FRGC-Morphs) and compare the proposed solution to both discriminatively-trained and once-class MAD models. The experimental results show that our MAD model achieves highly competitive results on all considered datasets.},
keywords = {biometrics, deep learning, denoising diffusion probabilistic models, diffusion, face, face morphing attack, morphing attack, morphing attack detection},
pubstate = {published},
tppubtype = {inproceedings}
}
Morphed face images have recently become a growing concern for existing face verification systems, as they are relatively easy to generate and can be used to impersonate someone's identity for various malicious purposes. Efficient Morphing Attack Detection (MAD) that generalizes well across different morphing techniques is, therefore, of paramount importance. Existing MAD techniques predominantly rely on discriminative models that learn from examples of bona fide and morphed images and, as a result, often exhibit sub-optimal generalization performance when confronted with unknown types of morphing attacks. To address this problem, we propose a novel, diffusion--based MAD method in this paper that learns only from the characteristics of bona fide images. Various forms of morphing attacks are then detected by our model as out-of-distribution samples. We perform rigorous experiments over four different datasets (CASIA-WebFace, FRLL-Morphs, FERET-Morphs and FRGC-Morphs) and compare the proposed solution to both discriminatively-trained and once-class MAD models. The experimental results show that our MAD model achieves highly competitive results on all considered datasets. |
Babnik, Žiga; Damer, Naser; Štruc, Vitomir Optimization-Based Improvement of Face Image Quality Assessment Techniques Proceedings Article In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), 2023. @inproceedings{iwbf2023babnik,
title = {Optimization-Based Improvement of Face Image Quality Assessment Techniques},
author = {Žiga Babnik and Naser Damer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/03/IWBF_23___paper-1.pdf},
year = {2023},
date = {2023-02-28},
booktitle = {Proceedings of the International Workshop on Biometrics and Forensics (IWBF)},
abstract = {Contemporary face recognition~(FR) models achieve near-ideal recognition performance in constrained settings, yet do not fully translate the performance to unconstrained (real-world) scenarios. To help improve the performance and stability of FR systems in such unconstrained settings, face image quality assessment (FIQA) techniques try to infer sample-quality information from the input face images that can aid with the recognition process. While existing FIQA techniques are able to efficiently capture the differences between high and low quality images, they typically cannot fully distinguish between images of similar quality, leading to lower performance in many scenarios. To address this issue, we present in this paper a supervised quality-label optimization approach, aimed at improving the performance of existing FIQA techniques. The developed optimization procedure infuses additional information (computed with a selected FR model) into the initial quality scores generated with a given FIQA technique to produce better estimates of the ``actual'' image quality. We evaluate the proposed approach in comprehensive experiments with six state-of-the-art FIQA approaches (CR-FIQA, FaceQAN, SER-FIQ, PCNet, MagFace, SER-FIQ) on five commonly used benchmarks (LFW, CFP-FP, CPLFW, CALFW, XQLFW) using three targeted FR models (ArcFace, ElasticFace, CurricularFace) with highly encouraging results. },
keywords = {distillation, face, face image quality assessment, face image quality estimation, face images, optimization, quality, transfer learning},
pubstate = {published},
tppubtype = {inproceedings}
}
Contemporary face recognition~(FR) models achieve near-ideal recognition performance in constrained settings, yet do not fully translate the performance to unconstrained (real-world) scenarios. To help improve the performance and stability of FR systems in such unconstrained settings, face image quality assessment (FIQA) techniques try to infer sample-quality information from the input face images that can aid with the recognition process. While existing FIQA techniques are able to efficiently capture the differences between high and low quality images, they typically cannot fully distinguish between images of similar quality, leading to lower performance in many scenarios. To address this issue, we present in this paper a supervised quality-label optimization approach, aimed at improving the performance of existing FIQA techniques. The developed optimization procedure infuses additional information (computed with a selected FR model) into the initial quality scores generated with a given FIQA technique to produce better estimates of the ``actual'' image quality. We evaluate the proposed approach in comprehensive experiments with six state-of-the-art FIQA approaches (CR-FIQA, FaceQAN, SER-FIQ, PCNet, MagFace, SER-FIQ) on five commonly used benchmarks (LFW, CFP-FP, CPLFW, CALFW, XQLFW) using three targeted FR models (ArcFace, ElasticFace, CurricularFace) with highly encouraging results. |
Vitek, Matej; Das, Abhijit; Lucio, Diego Rafael; Jr., Luiz Antonio Zanlorensi; Menotti, David; Khiarak, Jalil Nourmohammadi; Shahpar, Mohsen Akbari; Asgari-Chenaghlu, Meysam; Jaryani, Farhang; Tapia, Juan E.; Valenzuela, Andres; Wang, Caiyong; Wang, Yunlong; He, Zhaofeng; Sun, Zhenan; Boutros, Fadi; Damer, Naser; Grebe, Jonas Henry; Kuijper, Arjan; Raja, Kiran; Gupta, Gourav; Zampoukis, Georgios; Tsochatzidis, Lazaros; Pratikakis, Ioannis; Kumar, S. V. Aruna; Harish, B. S.; Pal, Umapada; Peer, Peter; Štruc, Vitomir Exploring Bias in Sclera Segmentation Models: A Group Evaluation Approach Journal Article In: IEEE Transactions on Information Forensics and Security, vol. 18, pp. 190-205, 2023, ISSN: 1556-6013. @article{TIFS_Sclera2022,
title = {Exploring Bias in Sclera Segmentation Models: A Group Evaluation Approach},
author = {Matej Vitek and Abhijit Das and Diego Rafael Lucio and Luiz Antonio Zanlorensi Jr. and David Menotti and Jalil Nourmohammadi Khiarak and Mohsen Akbari Shahpar and Meysam Asgari-Chenaghlu and Farhang Jaryani and Juan E. Tapia and Andres Valenzuela and Caiyong Wang and Yunlong Wang and Zhaofeng He and Zhenan Sun and Fadi Boutros and Naser Damer and Jonas Henry Grebe and Arjan Kuijper and Kiran Raja and Gourav Gupta and Georgios Zampoukis and Lazaros Tsochatzidis and Ioannis Pratikakis and S. V. Aruna Kumar and B. S. Harish and Umapada Pal and Peter Peer and Vitomir Štruc},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9926136},
doi = {10.1109/TIFS.2022.3216468},
issn = {1556-6013},
year = {2023},
date = {2023-01-18},
urldate = {2022-10-18},
journal = {IEEE Transactions on Information Forensics and Security},
volume = {18},
pages = {190-205},
abstract = {Bias and fairness of biometric algorithms have been key topics of research in recent years, mainly due to the societal, legal and ethical implications of potentially unfair decisions made by automated decision-making models. A considerable amount of work has been done on this topic across different biometric modalities, aiming at better understanding the main sources of algorithmic bias or devising mitigation measures. In this work, we contribute to these efforts and present the first study investigating bias and fairness of sclera segmentation models. Although sclera segmentation techniques represent a key component of sclera-based biometric systems with a considerable impact on the overall recognition performance, the presence of different types of biases in sclera segmentation methods is still underexplored. To address this limitation, we describe the results of a group evaluation effort (involving seven research groups), organized to explore the performance of recent sclera segmentation models within a common experimental framework and study performance differences (and bias), originating from various demographic as well as environmental factors. Using five diverse datasets, we analyze seven independently developed sclera segmentation models in different experimental configurations. The results of our experiments suggest that there are significant differences in the overall segmentation performance across the seven models and that among the considered factors, ethnicity appears to be the biggest cause of bias. Additionally, we observe that training with representative and balanced data does not necessarily lead to less biased results. Finally, we find that in general there appears to be a negative correlation between the amount of bias observed (due to eye color, ethnicity and acquisition device) and the overall segmentation performance, suggesting that advances in the field of semantic segmentation may also help with mitigating bias.},
keywords = {bias, biometrics, fairness, group evaluation, ocular, sclera, sclera segmentation, segmentation},
pubstate = {published},
tppubtype = {article}
}
Bias and fairness of biometric algorithms have been key topics of research in recent years, mainly due to the societal, legal and ethical implications of potentially unfair decisions made by automated decision-making models. A considerable amount of work has been done on this topic across different biometric modalities, aiming at better understanding the main sources of algorithmic bias or devising mitigation measures. In this work, we contribute to these efforts and present the first study investigating bias and fairness of sclera segmentation models. Although sclera segmentation techniques represent a key component of sclera-based biometric systems with a considerable impact on the overall recognition performance, the presence of different types of biases in sclera segmentation methods is still underexplored. To address this limitation, we describe the results of a group evaluation effort (involving seven research groups), organized to explore the performance of recent sclera segmentation models within a common experimental framework and study performance differences (and bias), originating from various demographic as well as environmental factors. Using five diverse datasets, we analyze seven independently developed sclera segmentation models in different experimental configurations. The results of our experiments suggest that there are significant differences in the overall segmentation performance across the seven models and that among the considered factors, ethnicity appears to be the biggest cause of bias. Additionally, we observe that training with representative and balanced data does not necessarily lead to less biased results. Finally, we find that in general there appears to be a negative correlation between the amount of bias observed (due to eye color, ethnicity and acquisition device) and the overall segmentation performance, suggesting that advances in the field of semantic segmentation may also help with mitigating bias. |
Grm, Klemen; Ozata, Berk; Struc, Vitomir; Ekenel, Hazim K. Meet-in-the-middle: Multi-scale upsampling and matching for cross-resolution face recognition Proceedings Article In: WACV workshops, pp. 120-129, 2023. @inproceedings{WACVW2023,
title = {Meet-in-the-middle: Multi-scale upsampling and matching for cross-resolution face recognition},
author = {Klemen Grm and Berk Ozata and Vitomir Struc and Hazim K. Ekenel},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/01/Meet_in_the_middle.pdf
https://arxiv.org/abs/2211.15225
https://openaccess.thecvf.com/content/WACV2023W/RWS/papers/Grm_Meet-in-the-Middle_Multi-Scale_Upsampling_and_Matching_for_Cross-Resolution_Face_Recognition_WACVW_2023_paper.pdf
},
year = {2023},
date = {2023-01-06},
booktitle = {WACV workshops},
pages = {120-129},
abstract = {In this paper, we aim to address the large domain gap between high-resolution face images, e.g., from professional portrait photography, and low-quality surveillance images, e.g., from security cameras. Establishing an identity match between disparate sources like this is a classical surveillance face identification scenario, which continues to be a challenging problem for modern face recognition techniques. To that end, we propose a method that combines face super-resolution, resolution matching, and multi-scale template accumulation to reliably recognize faces from long-range surveillance footage, including from low quality sources. The proposed approach does not require training or fine-tuning on the target dataset of real surveillance images. Extensive experiments show that our proposed method is able to outperform even existing methods fine-tuned to the SCFace dataset.},
keywords = {deep learning, face, face recognition, multi-scale matching, smart surveillance, surveillance, surveillance technology},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper, we aim to address the large domain gap between high-resolution face images, e.g., from professional portrait photography, and low-quality surveillance images, e.g., from security cameras. Establishing an identity match between disparate sources like this is a classical surveillance face identification scenario, which continues to be a challenging problem for modern face recognition techniques. To that end, we propose a method that combines face super-resolution, resolution matching, and multi-scale template accumulation to reliably recognize faces from long-range surveillance footage, including from low quality sources. The proposed approach does not require training or fine-tuning on the target dataset of real surveillance images. Extensive experiments show that our proposed method is able to outperform even existing methods fine-tuned to the SCFace dataset. |
Eyiokur, Fevziye Irem; Kantarci, Alperen; Erakin, Mustafa Ekrem; Damer, Naser; Ofli, Ferda; Imran, Muhammad; Križaj, Janez; Salah, Albert Ali; Waibel, Alexander; Štruc, Vitomir; Ekenel, Hazim K. A Survey on Computer Vision based Human Analysis in the COVID-19 Era Journal Article In: Image and Vision Computing, vol. 130, no. 104610, pp. 1-19, 2023. @article{IVC2023,
title = {A Survey on Computer Vision based Human Analysis in the COVID-19 Era},
author = {Fevziye Irem Eyiokur and Alperen Kantarci and Mustafa Ekrem Erakin and Naser Damer and Ferda Ofli and Muhammad Imran and Janez Križaj and Albert Ali Salah and Alexander Waibel and Vitomir Štruc and Hazim K. Ekenel },
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2023/01/FG4COVID19_PAPER_compressed.pdf
https://authors.elsevier.com/a/1gKOyxnVK7RBS},
doi = {https://doi.org/10.1016/j.imavis.2022.104610},
year = {2023},
date = {2023-01-01},
journal = {Image and Vision Computing},
volume = {130},
number = {104610},
pages = {1-19},
abstract = {The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including
face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks.
Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given at the end of the survey. This work is intended to have a broad appeal and be useful not only for computer vision researchers but also the general public.},
keywords = {COVID-19, face, face alignment, face analysis, face image processing, face image quality assessment, face landmarking, face recognition, face verification, human analysis, masked face analysis},
pubstate = {published},
tppubtype = {article}
}
The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including
face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks.
Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given at the end of the survey. This work is intended to have a broad appeal and be useful not only for computer vision researchers but also the general public. |
Hrovatič, Anja; Peer, Peter; Štruc, Vitomir; Emeršič, Žiga Efficient ear alignment using a two-stack hourglass network Journal Article In: IET Biometrics , pp. 1-14, 2023, ISSN: 2047-4938. @article{UhljiIETZiga,
title = {Efficient ear alignment using a two-stack hourglass network},
author = {Anja Hrovatič and Peter Peer and Vitomir Štruc and Žiga Emeršič},
url = {https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2.12109},
doi = {10.1049/bme2.12109},
issn = {2047-4938},
year = {2023},
date = {2023-01-01},
journal = {IET Biometrics },
pages = {1-14},
abstract = {Ear images have been shown to be a reliable modality for biometric recognition with desirable characteristics, such as high universality, distinctiveness, measurability and permanence. While a considerable amount of research has been directed towards ear recognition techniques, the problem of ear alignment is still under-explored in the open literature. Nonetheless, accurate alignment of ear images, especially in unconstrained acquisition scenarios, where the ear appearance is expected to vary widely due to pose and view point variations, is critical for the performance of all downstream tasks, including ear recognition. Here, the authors address this problem and present a framework for ear alignment that relies on a two-step procedure: (i) automatic landmark detection and (ii) fiducial point alignment. For the first (landmark detection) step, the authors implement and train a Two-Stack Hourglass model (2-SHGNet) capable of accurately predicting 55 landmarks on diverse ear images captured in uncontrolled conditions. For the second (alignment) step, the authors use the Random Sample Consensus (RANSAC) algorithm to align the estimated landmark/fiducial points with a pre-defined ear shape (i.e. a collection of average ear landmark positions). The authors evaluate the proposed framework in comprehensive experiments on the AWEx and ITWE datasets and show that the 2-SHGNet model leads to more accurate landmark predictions than competing state-of-the-art models from the literature. Furthermore, the authors also demonstrate that the alignment step significantly improves recognition accuracy with ear images from unconstrained environments compared to unaligned imagery.},
keywords = {biometrics, CNN, deep learning, ear, ear alignment, ear recognition},
pubstate = {published},
tppubtype = {article}
}
Ear images have been shown to be a reliable modality for biometric recognition with desirable characteristics, such as high universality, distinctiveness, measurability and permanence. While a considerable amount of research has been directed towards ear recognition techniques, the problem of ear alignment is still under-explored in the open literature. Nonetheless, accurate alignment of ear images, especially in unconstrained acquisition scenarios, where the ear appearance is expected to vary widely due to pose and view point variations, is critical for the performance of all downstream tasks, including ear recognition. Here, the authors address this problem and present a framework for ear alignment that relies on a two-step procedure: (i) automatic landmark detection and (ii) fiducial point alignment. For the first (landmark detection) step, the authors implement and train a Two-Stack Hourglass model (2-SHGNet) capable of accurately predicting 55 landmarks on diverse ear images captured in uncontrolled conditions. For the second (alignment) step, the authors use the Random Sample Consensus (RANSAC) algorithm to align the estimated landmark/fiducial points with a pre-defined ear shape (i.e. a collection of average ear landmark positions). The authors evaluate the proposed framework in comprehensive experiments on the AWEx and ITWE datasets and show that the 2-SHGNet model leads to more accurate landmark predictions than competing state-of-the-art models from the literature. Furthermore, the authors also demonstrate that the alignment step significantly improves recognition accuracy with ear images from unconstrained environments compared to unaligned imagery. |
2022
|
Gan, Chenquan; Yang, Yucheng; Zhub, Qingyi; Jain, Deepak Kumar; Struc, Vitomir DHF-Net: A hierarchical feature interactive fusion network for dialogue emotion recognition Journal Article In: Expert Systems with Applications, vol. 210, 2022. @article{TextEmotionESWA,
title = {DHF-Net: A hierarchical feature interactive fusion network for dialogue emotion recognition},
author = {Chenquan Gan and Yucheng Yang and Qingyi Zhub and Deepak Kumar Jain and Vitomir Struc},
url = {https://www.sciencedirect.com/science/article/pii/S0957417422016025?via%3Dihub},
doi = {https://doi.org/10.1016/j.eswa.2022.118525},
year = {2022},
date = {2022-12-30},
urldate = {2022-08-01},
journal = {Expert Systems with Applications},
volume = {210},
abstract = {To balance the trade-off between contextual information and fine-grained information in identifying specific emotions during a dialogue and combine the interaction of hierarchical feature related information, this paper proposes a hierarchical feature interactive fusion network (named DHF-Net), which not only can retain the integrity of the context sequence information but also can extract more fine-grained information. To obtain a deep semantic information, DHF-Net processes the task of recognizing dialogue emotion and dialogue act/intent separately, and then learns the cross-impact of two tasks through collaborative attention. Also, a bidirectional gate recurrent unit (Bi-GRU) connected hybrid convolutional neural network (CNN) group method is designed, by which the sequence information is smoothly sent to the multi-level local information layers for feature exaction. Experimental results show that, on two open session datasets, the performance of DHF-Net is improved by 1.8% and 1.2%, respectively.},
keywords = {attention, CNN, deep learning, dialogue, emotion recognition, fusion, fusion network, nlp, semantics, text, text processing},
pubstate = {published},
tppubtype = {article}
}
To balance the trade-off between contextual information and fine-grained information in identifying specific emotions during a dialogue and combine the interaction of hierarchical feature related information, this paper proposes a hierarchical feature interactive fusion network (named DHF-Net), which not only can retain the integrity of the context sequence information but also can extract more fine-grained information. To obtain a deep semantic information, DHF-Net processes the task of recognizing dialogue emotion and dialogue act/intent separately, and then learns the cross-impact of two tasks through collaborative attention. Also, a bidirectional gate recurrent unit (Bi-GRU) connected hybrid convolutional neural network (CNN) group method is designed, by which the sequence information is smoothly sent to the multi-level local information layers for feature exaction. Experimental results show that, on two open session datasets, the performance of DHF-Net is improved by 1.8% and 1.2%, respectively. |
Tomašević, Darian; Peer, Peter; Štruc, Vitomir BiOcularGAN: Bimodal Synthesis and Annotation of Ocular Images Proceedings Article In: IEEE/IAPR International Joint Conference on Biometrics (IJCB 2022) , pp. 1-10, 2022. @inproceedings{TomasevicIJCBBiOcular,
title = {BiOcularGAN: Bimodal Synthesis and Annotation of Ocular Images},
author = {Darian Tomašević and Peter Peer and Vitomir Štruc },
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/12/BiModal_StyleGAN.pdf
https://arxiv.org/pdf/2205.01536.pdf},
year = {2022},
date = {2022-10-20},
urldate = {2022-10-20},
booktitle = {IEEE/IAPR International Joint Conference on Biometrics (IJCB 2022) },
pages = {1-10},
abstract = {Current state-of-the-art segmentation techniques for ocular images are critically dependent on large-scale annotated datasets, which are labor-intensive to gather and often raise privacy concerns. In this paper, we present a novel framework, called BiOcularGAN, capable of generating synthetic large-scale datasets of photorealistic (visible light and near-infrared) ocular images, together with corresponding segmentation labels to address these issues. At its core, the framework relies on a novel Dual-Branch StyleGAN2 (DB-StyleGAN2) model that facilitates bimodal image generation, and a Semantic Mask Generator (SMG) component that produces semantic annotations by exploiting latent features of the DB-StyleGAN2 model. We evaluate BiOcularGAN through extensive experiments across five diverse ocular datasets and analyze the effects of bimodal data generation on image quality and the produced annotations. Our experimental results show that BiOcularGAN is able to produce high-quality matching bimodal images and annotations (with minimal manual intervention) that can be used to train highly competitive (deep) segmentation models (in a privacy aware-manner) that perform well across multiple real-world datasets. The source code for the BiOcularGAN framework is publicly available at: https://github.com/dariant/BiOcularGAN.},
keywords = {biometrics, CNN, data synthesis, deep learning, ocular, segmentation, StyleGAN, synthetic data},
pubstate = {published},
tppubtype = {inproceedings}
}
Current state-of-the-art segmentation techniques for ocular images are critically dependent on large-scale annotated datasets, which are labor-intensive to gather and often raise privacy concerns. In this paper, we present a novel framework, called BiOcularGAN, capable of generating synthetic large-scale datasets of photorealistic (visible light and near-infrared) ocular images, together with corresponding segmentation labels to address these issues. At its core, the framework relies on a novel Dual-Branch StyleGAN2 (DB-StyleGAN2) model that facilitates bimodal image generation, and a Semantic Mask Generator (SMG) component that produces semantic annotations by exploiting latent features of the DB-StyleGAN2 model. We evaluate BiOcularGAN through extensive experiments across five diverse ocular datasets and analyze the effects of bimodal data generation on image quality and the produced annotations. Our experimental results show that BiOcularGAN is able to produce high-quality matching bimodal images and annotations (with minimal manual intervention) that can be used to train highly competitive (deep) segmentation models (in a privacy aware-manner) that perform well across multiple real-world datasets. The source code for the BiOcularGAN framework is publicly available at: https://github.com/dariant/BiOcularGAN. |
Huber, Marco; Boutros, Fadi; Luu, Anh Thi; Raja, Kiran; Ramachandra, Raghavendra; Damer, Naser; Neto, Pedro C.; Goncalves, Tiago; Sequeira, Ana F.; Cardoso, Jaime S.; Tremoco, João; Lourenco, Miguel; Serra, Sergio; Cermeno, Eduardo; Ivanovska, Marija; Batagelj, Borut; Kronovšek, Andrej; Peer, Peter; Štruc, Vitomir SYN-MAD 2022: Competition on Face Morphing Attack Detection based on Privacy-aware Synthetic Training Data Proceedings Article In: IEEE International Joint Conference on Biometrics (IJCB), pp. 1-10, 2022, ISBN: 978-1-6654-6394-2. @inproceedings{IvanovskaSYNMAD,
title = {SYN-MAD 2022: Competition on Face Morphing Attack Detection based on Privacy-aware Synthetic Training Data},
author = {Marco Huber and Fadi Boutros and Anh Thi Luu and Kiran Raja and Raghavendra Ramachandra and Naser Damer and Pedro C. Neto and Tiago Goncalves and Ana F. Sequeira and Jaime S. Cardoso and João Tremoco and Miguel Lourenco and Sergio Serra and Eduardo Cermeno and Marija Ivanovska and Borut Batagelj and Andrej Kronovšek and Peter Peer and Vitomir Štruc},
url = {https://ieeexplore.ieee.org/iel7/10007927/10007928/10007950.pdf?casa_token=k7CV1Vs4DUsAAAAA:xMvzvPAyLBoPv1PqtJQTmZQ9S3TJOlExgcxOeuZPNEuVFKVuIfofx30CgN-jnhVB8_5o_Ne3nJLB},
doi = {10.1109/IJCB54206.2022.10007950},
isbn = {978-1-6654-6394-2},
year = {2022},
date = {2022-09-01},
urldate = {2022-09-01},
booktitle = {IEEE International Joint Conference on Biometrics (IJCB)},
pages = {1-10},
keywords = {data synthesis, deep learning, face, face PAD, pad, synthetic data},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Ivanovska, Marija; Kronovšek, Andrej; Peer, Peter; Štruc, Vitomir; Batagelj, Borut Face Morphing Attack Detection Using Privacy-Aware Training Data Proceedings Article In: Proceedings of ERK 2022, pp. 1-4, 2022. @inproceedings{MarijaMorphing,
title = {Face Morphing Attack Detection Using Privacy-Aware Training Data},
author = {Marija Ivanovska and Andrej Kronovšek and Peter Peer and Vitomir Štruc and Borut Batagelj },
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/08/2022_ERK__Face_Morphing_Attack_Detecton_Using_Privacy_Aware_Training_Data.pdf},
year = {2022},
date = {2022-08-01},
urldate = {2022-08-01},
booktitle = {Proceedings of ERK 2022},
pages = {1-4},
abstract = {Images of morphed faces pose a serious threat to face recognition--based security systems, as they can be used to illegally verify the identity of multiple people with a single morphed image. Modern detection algorithms learn to identify such morphing attacks using authentic images of real individuals. This approach raises various privacy concerns and limits the amount of publicly available training data. In this paper, we explore the efficacy of detection algorithms that are trained only on faces of non--existing people and their respective morphs. To this end, two dedicated algorithms are trained with synthetic data and then evaluated on three real-world datasets, i.e.: FRLL-Morphs, FERET-Morphs and FRGC-Morphs. Our results show that synthetic facial images can be successfully employed for the training process of the detection algorithms and generalize well to real-world scenarios.},
keywords = {competition, face, face morphing, face morphing attack, face morphing detection, private data, synthetic data},
pubstate = {published},
tppubtype = {inproceedings}
}
Images of morphed faces pose a serious threat to face recognition--based security systems, as they can be used to illegally verify the identity of multiple people with a single morphed image. Modern detection algorithms learn to identify such morphing attacks using authentic images of real individuals. This approach raises various privacy concerns and limits the amount of publicly available training data. In this paper, we explore the efficacy of detection algorithms that are trained only on faces of non--existing people and their respective morphs. To this end, two dedicated algorithms are trained with synthetic data and then evaluated on three real-world datasets, i.e.: FRLL-Morphs, FERET-Morphs and FRGC-Morphs. Our results show that synthetic facial images can be successfully employed for the training process of the detection algorithms and generalize well to real-world scenarios. |
Šircelj, Jaka; Peer, Peter; Solina, Franc; Štruc, Vitomir Hierarchical Superquadric Decomposition with Implicit Space Separation Proceedings Article In: Proceedings of ERK 2022, pp. 1-4, 2022. @inproceedings{SirceljSuperQuadrics,
title = {Hierarchical Superquadric Decomposition with Implicit Space Separation},
author = {Jaka Šircelj and Peter Peer and Franc Solina and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/08/sq_erk.pdf},
year = {2022},
date = {2022-08-01},
urldate = {2022-08-01},
booktitle = {Proceedings of ERK 2022},
pages = {1-4},
abstract = {We introduce a new method to reconstruct 3D objects using a set of volumetric primitives, i.e., superquadrics. The method hierarchically decomposes a target 3D object into pairs of superquadrics recovering finer and finer details. While such hierarchical methods have been studied before, we introduce a new way of splitting the object space using only properties of the predicted superquadrics. The method is trained and evaluated on the ShapeNet dataset. The results of our experiments suggest that reasonable reconstructions can be obtained with the proposed approach for a diverse set of objects with complex geometry.},
keywords = {CNN, deep learning, depth estimation, iterative procedure, model fitting, recursive model, superquadric, superquadrics, volumetric primitive},
pubstate = {published},
tppubtype = {inproceedings}
}
We introduce a new method to reconstruct 3D objects using a set of volumetric primitives, i.e., superquadrics. The method hierarchically decomposes a target 3D object into pairs of superquadrics recovering finer and finer details. While such hierarchical methods have been studied before, we introduce a new way of splitting the object space using only properties of the predicted superquadrics. The method is trained and evaluated on the ShapeNet dataset. The results of our experiments suggest that reasonable reconstructions can be obtained with the proposed approach for a diverse set of objects with complex geometry. |
Grm, Klemen; Štruc, Vitomir Optimization-based Image Filter Design for Self-supervised Super-resolution Training Proceedings Article In: Proceedings of ERK 2022, 2022. @inproceedings{Grm2022Erk,
title = {Optimization-based Image Filter Design for Self-supervised Super-resolution Training},
author = {Klemen Grm and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/08/erk22_filtri.pdf},
year = {2022},
date = {2022-08-01},
booktitle = {Proceedings of ERK 2022},
abstract = {Single-image super-resolution can be posed as a self - supervised machine learning task, where the training inputs and targets are derived from an unlabelled dataset of high-resolution images. For super-resolution training, the derivation takes the form of a degradation function that yields low-resolution images given high-resolution ones. Typically, the degradation function is selected manually based on heuristics, such as the desired magnification ratio of the super-resolution method being trained. In this paper, we instead propose principled, optimization-based methods for picking the image filter of the degradation function based on its desired properties in the frequency domain. We develop implicit and explicit methods for filter optimization and demonstrate the resulting filters are better at rejecting aliasing and matching the frequency domain characteristics of real-life low-resolution images than commonly used heuristic picks.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Single-image super-resolution can be posed as a self - supervised machine learning task, where the training inputs and targets are derived from an unlabelled dataset of high-resolution images. For super-resolution training, the derivation takes the form of a degradation function that yields low-resolution images given high-resolution ones. Typically, the degradation function is selected manually based on heuristics, such as the desired magnification ratio of the super-resolution method being trained. In this paper, we instead propose principled, optimization-based methods for picking the image filter of the degradation function based on its desired properties in the frequency domain. We develop implicit and explicit methods for filter optimization and demonstrate the resulting filters are better at rejecting aliasing and matching the frequency domain characteristics of real-life low-resolution images than commonly used heuristic picks. |
Babnik, Žiga; Štruc, Vitomir Iterativna optimizacija ocen kakovosti slikovnih podatkov v sistemih za razpoznavanje obrazov Proceedings Article In: Proceedings of ERK 2022, pp. 1-4, 2022. @inproceedings{BabnikErk2022,
title = {Iterativna optimizacija ocen kakovosti slikovnih podatkov v sistemih za razpoznavanje obrazov},
author = {Žiga Babnik and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/08/ERK_2022.pdf},
year = {2022},
date = {2022-08-01},
booktitle = {Proceedings of ERK 2022},
pages = {1-4},
abstract = {While recent face recognition (FR) systems achieve excellent results in many deployment scenarios, their performance in challenging real-world settings is still under question. For this reason, face image quality assessment (FIQA) techniques aim to support FR systems, by providing them with sample quality information that can be used to reject poor quality data unsuitable for recognition purposes. Several groups of FIQA methods relying on different concepts have been proposed in the literature, all of which can be used for generating quality scores of facial images that can serve as pseudo ground-truth (quality) labels and be exploited for training (regression-based) quality estimation models. Several FIQA approaches show that a significant amount of sample-quality information can be extracted from mated similarity-score distributions generated with some face matcher. Based on this insight, we propose in this paper a quality label optimization approach, which incorporates sample-quality information from mated-pair similarities into quality predictions of existing off-the-shelf FIQA techniques. We evaluate the proposed approach using three state-of-the-art FIQA methods over three diverse datasets. The results of our experiments show that the proposed optimization procedure heavily depends on the number of executed optimization iterations. At ten iterations, the approach seems to perform the best, consistently outperforming the base quality scores of the three FIQA methods, chosen for the experiments.},
keywords = {CNN, face image quality estimation, face quality, face recognition, optimization, supervised quality estimation},
pubstate = {published},
tppubtype = {inproceedings}
}
While recent face recognition (FR) systems achieve excellent results in many deployment scenarios, their performance in challenging real-world settings is still under question. For this reason, face image quality assessment (FIQA) techniques aim to support FR systems, by providing them with sample quality information that can be used to reject poor quality data unsuitable for recognition purposes. Several groups of FIQA methods relying on different concepts have been proposed in the literature, all of which can be used for generating quality scores of facial images that can serve as pseudo ground-truth (quality) labels and be exploited for training (regression-based) quality estimation models. Several FIQA approaches show that a significant amount of sample-quality information can be extracted from mated similarity-score distributions generated with some face matcher. Based on this insight, we propose in this paper a quality label optimization approach, which incorporates sample-quality information from mated-pair similarities into quality predictions of existing off-the-shelf FIQA techniques. We evaluate the proposed approach using three state-of-the-art FIQA methods over three diverse datasets. The results of our experiments show that the proposed optimization procedure heavily depends on the number of executed optimization iterations. At ten iterations, the approach seems to perform the best, consistently outperforming the base quality scores of the three FIQA methods, chosen for the experiments. |
Tomašecić, Darian; Peer, Peter; Solina, Franc; Jaklič, Aleš; Štruc, Vitomir Reconstructing Superquadrics from Intensity and Color Images Journal Article In: Sensors, vol. 22, iss. 4, no. 5332, 2022. @article{TomasevicSensors,
title = {Reconstructing Superquadrics from Intensity and Color Images},
author = {Darian Tomašecić and Peter Peer and Franc Solina and Aleš Jaklič and Vitomir Štruc},
url = {https://www.mdpi.com/1424-8220/22/14/5332/pdf?version=1658380987},
doi = {https://doi.org/10.3390/s22145332},
year = {2022},
date = {2022-07-16},
journal = {Sensors},
volume = {22},
number = {5332},
issue = {4},
abstract = {The task of reconstructing 3D scenes based on visual data represents a longstanding problem in computer vision. Common reconstruction approaches rely on the use of multiple volumetric primitives to describe complex objects. Superquadrics (a class of volumetric primitives) have shown great promise due to their ability to describe various shapes with only a few parameters. Recent research has shown that deep learning methods can be used to accurately reconstruct random superquadrics from both 3D point cloud data and simple depth images. In this paper, we extended these reconstruction methods to intensity and color images. Specifically, we used a dedicated convolutional neural network (CNN) model to reconstruct a single superquadric from the given input image. We analyzed the results in a qualitative and quantitative manner, by visualizing reconstructed superquadrics as well as observing error and accuracy distributions of predictions. We showed that a CNN model designed around a simple ResNet backbone can be used to accurately reconstruct superquadrics from images containing one object, but only if one of the spatial parameters is fixed or if it can be determined from other image characteristics, e.g., shadows. Furthermore, we experimented with images of increasing complexity, for example, by adding textures, and observed that the results degraded only slightly. In addition, we show that our model outperforms the current state-of-the-art method on the studied task. Our final result is a highly accurate superquadric reconstruction model, which can also reconstruct superquadrics from real images of simple objects, without additional training.},
keywords = {arrs, CNN, depth data, depth estimation, depth sensing, intensity images, superquadric, superquadrics},
pubstate = {published},
tppubtype = {article}
}
The task of reconstructing 3D scenes based on visual data represents a longstanding problem in computer vision. Common reconstruction approaches rely on the use of multiple volumetric primitives to describe complex objects. Superquadrics (a class of volumetric primitives) have shown great promise due to their ability to describe various shapes with only a few parameters. Recent research has shown that deep learning methods can be used to accurately reconstruct random superquadrics from both 3D point cloud data and simple depth images. In this paper, we extended these reconstruction methods to intensity and color images. Specifically, we used a dedicated convolutional neural network (CNN) model to reconstruct a single superquadric from the given input image. We analyzed the results in a qualitative and quantitative manner, by visualizing reconstructed superquadrics as well as observing error and accuracy distributions of predictions. We showed that a CNN model designed around a simple ResNet backbone can be used to accurately reconstruct superquadrics from images containing one object, but only if one of the spatial parameters is fixed or if it can be determined from other image characteristics, e.g., shadows. Furthermore, we experimented with images of increasing complexity, for example, by adding textures, and observed that the results degraded only slightly. In addition, we show that our model outperforms the current state-of-the-art method on the studied task. Our final result is a highly accurate superquadric reconstruction model, which can also reconstruct superquadrics from real images of simple objects, without additional training. |
Babnik, Žiga; Peer, Peter; Štruc, Vitomir FaceQAN: Face Image Quality Assessment Through Adversarial Noise Exploration Proceedings Article In: IAPR International Conference on Pattern Recognition (ICPR), 2022. @inproceedings{ICPR2022,
title = {FaceQAN: Face Image Quality Assessment Through Adversarial Noise Exploration},
author = {Žiga Babnik and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/06/ICPR_2022___paper-17.pdf},
year = {2022},
date = {2022-05-17},
urldate = {2022-05-17},
booktitle = {IAPR International Conference on Pattern Recognition (ICPR)},
abstract = {Recent state-of-the-art face recognition (FR) approaches have achieved impressive performance, yet unconstrained face recognition still represents an open problem. Face image quality assessment (FIQA) approaches aim to estimate the quality of the input samples that can help provide information on the confidence of the recognition decision and eventually lead to improved results in challenging scenarios. While much progress has been made in face image quality assessment in recent years, computing reliable quality scores for diverse facial images and FR models remains challenging. In this paper, we propose a novel approach to face image quality assessment, called FaceQAN, that is based on adversarial examples and relies on the analysis of adversarial noise which can be calculated with any FR model learned by using some form of gradient descent. As such, the proposed approach is the first to link image quality to adversarial attacks. Comprehensive (cross-model as well as model-specific) experiments are conducted with four benchmark datasets, i.e., LFW, CFP–FP, XQLFW and IJB–C, four FR models, i.e., CosFace, ArcFace, CurricularFace and ElasticFace and in comparison to seven state-of-the-art FIQA methods to demonstrate the performance of FaceQAN. Experimental results show that FaceQAN achieves competitive results, while exhibiting several desirable characteristics. The source code for FaceQAN will be made publicly available.},
keywords = {adversarial examples, adversarial noise, biometrics, face image quality assessment, face recognition, FIQA, image quality assessment},
pubstate = {published},
tppubtype = {inproceedings}
}
Recent state-of-the-art face recognition (FR) approaches have achieved impressive performance, yet unconstrained face recognition still represents an open problem. Face image quality assessment (FIQA) approaches aim to estimate the quality of the input samples that can help provide information on the confidence of the recognition decision and eventually lead to improved results in challenging scenarios. While much progress has been made in face image quality assessment in recent years, computing reliable quality scores for diverse facial images and FR models remains challenging. In this paper, we propose a novel approach to face image quality assessment, called FaceQAN, that is based on adversarial examples and relies on the analysis of adversarial noise which can be calculated with any FR model learned by using some form of gradient descent. As such, the proposed approach is the first to link image quality to adversarial attacks. Comprehensive (cross-model as well as model-specific) experiments are conducted with four benchmark datasets, i.e., LFW, CFP–FP, XQLFW and IJB–C, four FR models, i.e., CosFace, ArcFace, CurricularFace and ElasticFace and in comparison to seven state-of-the-art FIQA methods to demonstrate the performance of FaceQAN. Experimental results show that FaceQAN achieves competitive results, while exhibiting several desirable characteristics. The source code for FaceQAN will be made publicly available. |
Babnik, Žiga; Štruc, Vitomir Assessing Bias in Face Image Quality Assessment Proceedings Article In: EUSIPCO 2022, 2022. @inproceedings{EUSIPCO_2022,
title = {Assessing Bias in Face Image Quality Assessment},
author = {Žiga Babnik and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/06/EUSIPCO_2022___paper.pdf},
year = {2022},
date = {2022-05-16},
urldate = {2022-05-16},
booktitle = {EUSIPCO 2022},
abstract = {Face image quality assessment (FIQA) attempts to improve face recognition (FR) performance by providing additional information about sample quality.
Because FIQA methods attempt to estimate the utility of a sample for face recognition, it is reasonable to assume that these methods are heavily influenced by the underlying face recognition system. Although modern face recognition systems are known to perform well, several studies have found that such systems often exhibit problems with demographic bias. It is therefore likely that such problems are also present with FIQA techniques. To investigate the demographic biases associated with FIQA approaches, this paper presents a comprehensive study involving a variety of quality assessment methods (general-purpose image quality assessment, supervised face quality assessment, and unsupervised face quality assessment methods) and three diverse state-of-the-art FR models.
Our analysis on the Balanced Faces in the Wild (BFW) dataset shows that all techniques considered are affected more by variations in race than sex. While the general-purpose image quality assessment methods appear to be less biased with respect to the two demographic factors considered, the supervised and unsupervised face image quality assessment methods both show strong bias with a tendency to favor white individuals (of either sex). In addition, we found that methods that are less racially biased perform worse overall. This suggests that the observed bias in FIQA methods is to a significant extent related to the underlying face recognition system.},
keywords = {bias, bias analysis, biometrics, face image quality assessment, face recognition, FIQA, image quality assessment},
pubstate = {published},
tppubtype = {inproceedings}
}
Face image quality assessment (FIQA) attempts to improve face recognition (FR) performance by providing additional information about sample quality.
Because FIQA methods attempt to estimate the utility of a sample for face recognition, it is reasonable to assume that these methods are heavily influenced by the underlying face recognition system. Although modern face recognition systems are known to perform well, several studies have found that such systems often exhibit problems with demographic bias. It is therefore likely that such problems are also present with FIQA techniques. To investigate the demographic biases associated with FIQA approaches, this paper presents a comprehensive study involving a variety of quality assessment methods (general-purpose image quality assessment, supervised face quality assessment, and unsupervised face quality assessment methods) and three diverse state-of-the-art FR models.
Our analysis on the Balanced Faces in the Wild (BFW) dataset shows that all techniques considered are affected more by variations in race than sex. While the general-purpose image quality assessment methods appear to be less biased with respect to the two demographic factors considered, the supervised and unsupervised face image quality assessment methods both show strong bias with a tendency to favor white individuals (of either sex). In addition, we found that methods that are less racially biased perform worse overall. This suggests that the observed bias in FIQA methods is to a significant extent related to the underlying face recognition system. |
Osorio-Roig, Daile; Rathgeb, Christian; Drozdowski, Pawel; Terhörst, Philipp; Štruc, Vitomir; Busch, Christoph An Attack on Feature Level-based Facial Soft-biometric Privacy Enhancement Journal Article In: IEEE Transactions on Biometrics, Identity and Behavior (TBIOM), vol. 4, iss. 2, pp. 263-275, 2022. @article{TBIOM_2022,
title = {An Attack on Feature Level-based Facial Soft-biometric Privacy Enhancement},
author = {Daile Osorio-Roig and Christian Rathgeb and Pawel Drozdowski and Philipp Terhörst and Vitomir Štruc and Christoph Busch},
url = {https://arxiv.org/pdf/2111.12405.pdf},
year = {2022},
date = {2022-05-02},
urldate = {2022-05-02},
journal = {IEEE Transactions on Biometrics, Identity and Behavior (TBIOM)},
volume = {4},
issue = {2},
pages = {263-275},
abstract = {In the recent past, different researchers have proposed novel privacy-enhancing face recognition systems designed to conceal soft-biometric information at feature level. These works have reported impressive results, but usually do not consider specific attacks in their analysis of privacy protection. In most cases, the privacy protection capabilities of these schemes are tested through simple machine learning-based classifiers and visualisations of dimensionality reduction tools. In this work, we introduce an attack on feature level-based facial soft–biometric privacy-enhancement techniques. The attack is based on two observations: (1) to achieve high recognition accuracy, certain similarities between facial representations have to be retained in their privacy-enhanced versions; (2) highly similar facial representations usually originate from face images with similar soft-biometric attributes. Based on these observations, the proposed attack compares a privacy-enhanced face representation against a set of privacy-enhanced face representations with known soft-biometric attributes. Subsequently, the best obtained similarity scores are analysed to infer the unknown soft-biometric attributes of the attacked privacy-enhanced face representation. That is, the attack only requires a relatively small database of arbitrary face images and the privacy-enhancing face recognition algorithm as a black-box. In the experiments, the attack is applied to two representative approaches which have previously been reported to reliably conceal the gender in privacy-enhanced face representations. It is shown that the presented attack is able to circumvent the privacy enhancement to a considerable degree and is able to correctly classify gender with an accuracy of up to approximately 90% for both of the analysed privacy-enhancing face recognition systems. Future works on privacy-enhancing face recognition are encouraged to include the proposed attack in evaluations on privacy protection.},
keywords = {attack, face recognition, privacy, privacy enhancement, privacy protection, privacy-enhancing techniques, soft biometric privacy},
pubstate = {published},
tppubtype = {article}
}
In the recent past, different researchers have proposed novel privacy-enhancing face recognition systems designed to conceal soft-biometric information at feature level. These works have reported impressive results, but usually do not consider specific attacks in their analysis of privacy protection. In most cases, the privacy protection capabilities of these schemes are tested through simple machine learning-based classifiers and visualisations of dimensionality reduction tools. In this work, we introduce an attack on feature level-based facial soft–biometric privacy-enhancement techniques. The attack is based on two observations: (1) to achieve high recognition accuracy, certain similarities between facial representations have to be retained in their privacy-enhanced versions; (2) highly similar facial representations usually originate from face images with similar soft-biometric attributes. Based on these observations, the proposed attack compares a privacy-enhanced face representation against a set of privacy-enhanced face representations with known soft-biometric attributes. Subsequently, the best obtained similarity scores are analysed to infer the unknown soft-biometric attributes of the attacked privacy-enhanced face representation. That is, the attack only requires a relatively small database of arbitrary face images and the privacy-enhancing face recognition algorithm as a black-box. In the experiments, the attack is applied to two representative approaches which have previously been reported to reliably conceal the gender in privacy-enhanced face representations. It is shown that the presented attack is able to circumvent the privacy enhancement to a considerable degree and is able to correctly classify gender with an accuracy of up to approximately 90% for both of the analysed privacy-enhancing face recognition systems. Future works on privacy-enhancing face recognition are encouraged to include the proposed attack in evaluations on privacy protection. |
Dvoršak, Grega; Dwivedi, Ankita; Štruc, Vitomir; Peer, Peter; Emeršič, Žiga Kinship Verification from Ear Images: An Explorative Study with Deep Learning Models Proceedings Article In: International Workshop on Biometrics and Forensics (IWBF), pp. 1–6, 2022. @inproceedings{KinEars,
title = {Kinship Verification from Ear Images: An Explorative Study with Deep Learning Models},
author = {Grega Dvoršak and Ankita Dwivedi and Vitomir Štruc and Peter Peer and Žiga Emeršič},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/03/Gregovi_Uhlji_Template-2.pdf},
year = {2022},
date = {2022-04-21},
urldate = {2022-04-21},
booktitle = {International Workshop on Biometrics and Forensics (IWBF)},
pages = {1--6},
abstract = {The analysis of kin relations from visual data represents a challenging research problem with important real-world applications. However, research in this area has mostly been limited to the analysis of facial images, despite the potential of other physical (human) characteristics for this task. In this paper, we therefore study the problem of kinship verification from ear images and investigate whether salient appearance characteristics, useful for this task, can be extracted from ear data. To facilitate the study, we introduce a novel dataset, called KinEar, that contains data from 19 families with each family member having from 15 to 31 ear images. Using the KinEar data, we conduct experiments using a Siamese training setup and 5 recent deep learning backbones. The results of our experiments suggests that ear images represent a viable alternative to other modalities for kinship verification, as 4 out of 5 considered models reach a performance of over 60% in terms of the Area Under the Receiver Operating Characteristics (ROC-AUC). },
keywords = {biometrics, CNN, deep learning, ear, ear biometrics, kinear, kinship, kinship recognition, transformer},
pubstate = {published},
tppubtype = {inproceedings}
}
The analysis of kin relations from visual data represents a challenging research problem with important real-world applications. However, research in this area has mostly been limited to the analysis of facial images, despite the potential of other physical (human) characteristics for this task. In this paper, we therefore study the problem of kinship verification from ear images and investigate whether salient appearance characteristics, useful for this task, can be extracted from ear data. To facilitate the study, we introduce a novel dataset, called KinEar, that contains data from 19 families with each family member having from 15 to 31 ear images. Using the KinEar data, we conduct experiments using a Siamese training setup and 5 recent deep learning backbones. The results of our experiments suggests that ear images represent a viable alternative to other modalities for kinship verification, as 4 out of 5 considered models reach a performance of over 60% in terms of the Area Under the Receiver Operating Characteristics (ROC-AUC). |
Jug, Julijan; Lampe, Ajda; Peer, Peter; Štruc, Vitomir Segmentacija telesa z uporabo večciljnega učenja Proceedings Article In: Proceedings of Rosus 2022, 2022. @inproceedings{Rosus2022,
title = {Segmentacija telesa z uporabo večciljnega učenja},
author = {Julijan Jug and Ajda Lampe and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/03/Rosus2020.pdf},
year = {2022},
date = {2022-03-17},
booktitle = {Proceedings of Rosus 2022},
abstract = {Segmentacija je pomemben del številnih problemov računalniškega vida, ki vključujejo človeške podobe, in je ena ključnih komponent, ki vpliva na uspešnost vseh nadaljnjih nalog. Več predhodnih del je ta problem obravnavalo z uporabo večciljnega modela, ki izkorišča korelacije med različnimi nalogami za izboljšanje uspešnosti segmentacije. Na podlagi uspešnosti takšnih rešitev v tem prispevku predstavljamo nov večciljni model za segmentacijo/razčlenjevanje ljudi, ki vključuje tri naloge, tj. (i) napoved skeletnih točk, (ii) napoved globinske predstavitve poze in (iii) segmentacijo človeškega telesa. Glavna ideja predlaganega modela Segmentacija-Skelet-Globinska predstavitev (ali na kratko iz angleščine SPD) je naučiti se boljšega modela segmentacije z izmenjavo znanja med različnimi, a med seboj povezanimi nalogami. SPD temelji na skupni hrbtenici globoke nevronske mreže, ki se razcepi na tri glave modela, specifične za nalogo, in se uči z uporabo cilja optimizacije za več nalog. Učinkovitost modela je analizirana s strogimi eksperimenti na nizih podatkov LIP in ATR ter v primerjavi z nedavnim (najsodobnejšim) večciljnim modelom segmentacije telesa. Predstavljene so tudi študije ablacije. Naši eksperimentalni rezultati kažejo, da je predlagani večciljni (segmentacijski) model zelo konkurenčen in da uvedba dodatnih nalog prispeva k večji skupni uspešnosti segmentacije.},
keywords = {deepbeauty, računalniški vid, segmentacija},
pubstate = {published},
tppubtype = {inproceedings}
}
Segmentacija je pomemben del številnih problemov računalniškega vida, ki vključujejo človeške podobe, in je ena ključnih komponent, ki vpliva na uspešnost vseh nadaljnjih nalog. Več predhodnih del je ta problem obravnavalo z uporabo večciljnega modela, ki izkorišča korelacije med različnimi nalogami za izboljšanje uspešnosti segmentacije. Na podlagi uspešnosti takšnih rešitev v tem prispevku predstavljamo nov večciljni model za segmentacijo/razčlenjevanje ljudi, ki vključuje tri naloge, tj. (i) napoved skeletnih točk, (ii) napoved globinske predstavitve poze in (iii) segmentacijo človeškega telesa. Glavna ideja predlaganega modela Segmentacija-Skelet-Globinska predstavitev (ali na kratko iz angleščine SPD) je naučiti se boljšega modela segmentacije z izmenjavo znanja med različnimi, a med seboj povezanimi nalogami. SPD temelji na skupni hrbtenici globoke nevronske mreže, ki se razcepi na tri glave modela, specifične za nalogo, in se uči z uporabo cilja optimizacije za več nalog. Učinkovitost modela je analizirana s strogimi eksperimenti na nizih podatkov LIP in ATR ter v primerjavi z nedavnim (najsodobnejšim) večciljnim modelom segmentacije telesa. Predstavljene so tudi študije ablacije. Naši eksperimentalni rezultati kažejo, da je predlagani večciljni (segmentacijski) model zelo konkurenčen in da uvedba dodatnih nalog prispeva k večji skupni uspešnosti segmentacije. |
Križaj, Janez; Dobrišek, Simon; Štruc, Vitomir Making the most of single sensor information : a novel fusion approach for 3D face recognition using region covariance descriptors and Gaussian mixture models Journal Article In: Sensors, iss. 6, no. 2388, pp. 1-26, 2022. @article{KrizajSensors2022,
title = {Making the most of single sensor information : a novel fusion approach for 3D face recognition using region covariance descriptors and Gaussian mixture models},
author = {Janez Križaj and Simon Dobrišek and Vitomir Štruc},
url = {https://www.mdpi.com/1424-8220/22/6/2388},
doi = {10.3390/s22062388},
year = {2022},
date = {2022-03-01},
journal = {Sensors},
number = {2388},
issue = {6},
pages = {1-26},
abstract = {Most commercially successful face recognition systems combine information from multiple sensors (2D and 3D, visible light and infrared, etc.) to achieve reliable recognition in various environments. When only a single sensor is available, the robustness as well as efficacy of the recognition process suffer. In this paper, we focus on face recognition using images captured by a single 3D sensor and propose a method based on the use of region covariance matrixes and Gaussian mixture models (GMMs). All steps of the proposed framework are automated, and no metadata, such as pre-annotated eye, nose, or mouth positions is required, while only a very simple clustering-based face detection is performed. The framework computes a set of region covariance descriptors from local regions of different face image representations and then uses the unscented transform to derive low-dimensional feature vectors, which are finally modeled by GMMs. In the last step, a support vector machine classification scheme is used to make a decision about the identity of the input 3D facial image. The proposed framework has several desirable characteristics, such as an inherent mechanism for data fusion/integration (through the region covariance matrixes), the ability to explore facial images at different levels of locality, and the ability to integrate a domain-specific prior knowledge into the modeling procedure. Several normalization techniques are incorporated into the proposed framework to further improve performance. Extensive experiments are performed on three prominent databases (FRGC v2, CASIA, and UMB-DB) yielding competitive results.},
keywords = {3d face, biometrics, face, face analysis, face images, face recognition},
pubstate = {published},
tppubtype = {article}
}
Most commercially successful face recognition systems combine information from multiple sensors (2D and 3D, visible light and infrared, etc.) to achieve reliable recognition in various environments. When only a single sensor is available, the robustness as well as efficacy of the recognition process suffer. In this paper, we focus on face recognition using images captured by a single 3D sensor and propose a method based on the use of region covariance matrixes and Gaussian mixture models (GMMs). All steps of the proposed framework are automated, and no metadata, such as pre-annotated eye, nose, or mouth positions is required, while only a very simple clustering-based face detection is performed. The framework computes a set of region covariance descriptors from local regions of different face image representations and then uses the unscented transform to derive low-dimensional feature vectors, which are finally modeled by GMMs. In the last step, a support vector machine classification scheme is used to make a decision about the identity of the input 3D facial image. The proposed framework has several desirable characteristics, such as an inherent mechanism for data fusion/integration (through the region covariance matrixes), the ability to explore facial images at different levels of locality, and the ability to integrate a domain-specific prior knowledge into the modeling procedure. Several normalization techniques are incorporated into the proposed framework to further improve performance. Extensive experiments are performed on three prominent databases (FRGC v2, CASIA, and UMB-DB) yielding competitive results. |
Jug, Julijan; Lampe, Ajda; Štruc, Vitomir; Peer, Peter Body Segmentation Using Multi-task Learning Proceedings Article In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, 2022, ISBN: 978-1-6654-5818-4. @inproceedings{JulijanJugBody,
title = {Body Segmentation Using Multi-task Learning},
author = {Julijan Jug and Ajda Lampe and Vitomir Štruc and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/03/ICAIIC_paper.pdf},
doi = {10.1109/ICAIIC54071.2022.9722662},
isbn = {978-1-6654-5818-4},
year = {2022},
date = {2022-01-20},
urldate = {2022-01-20},
booktitle = {International Conference on Artificial Intelligence in Information and Communication (ICAIIC)},
publisher = {IEEE},
abstract = {Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance. },
keywords = {body segmentation, cn, CNN, computer vision, deep beauty, deep learning, multi-task learning, segmentation, virtual try-on},
pubstate = {published},
tppubtype = {inproceedings}
}
Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance. |
Fele, Benjamin; Lampe, Ajda; Peer, Peter; Štruc, Vitomir C-VTON: Context-Driven Image-Based Virtual Try-On Network Proceedings Article In: IEEE/CVF Winter Applications in Computer Vision (WACV), pp. 1–10, 2022. @inproceedings{WACV2022_Fele,
title = {C-VTON: Context-Driven Image-Based Virtual Try-On Network},
author = {Benjamin Fele and Ajda Lampe and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/12/WACV2022_Benjamin_compressed-1.pdf},
year = {2022},
date = {2022-01-04},
urldate = {2022-01-04},
booktitle = {IEEE/CVF Winter Applications in Computer Vision (WACV)},
pages = {1--10},
abstract = {Image-based virtual try-on techniques have shown great promise for enhancing the user-experience and improving customer satisfaction on fashion-oriented e-commerce platforms. However, existing techniques are currently still limited in the quality of the try-on results they are able to produce from input images of diverse characteristics. In this work, we propose a Context-Driven Virtual Try-On Network (C-VTON) that addresses these limitations and convincingly transfers selected clothing items to the target subjects even under challenging pose configurations and in the presence of self-occlusions. At the core of the C-VTON pipeline are: (i) a geometric matching procedure that efficiently aligns the target clothing with the pose of the person in the input images, and (ii) a powerful image generator that utilizes various types of contextual information when synthesizing the final try-on result. C-VTON is evaluated in rigorous experiments on the VITON and MPV datasets and in comparison to state-of-the-art techniques from the literature. Experimental results show that the proposed approach is able to produce photo-realistic and visually convincing results and significantly improves on the existing state-of-the-art.},
keywords = {computer vision, deepbeauty, fashion, generative models, image editing, try-on, virtual try-on},
pubstate = {published},
tppubtype = {inproceedings}
}
Image-based virtual try-on techniques have shown great promise for enhancing the user-experience and improving customer satisfaction on fashion-oriented e-commerce platforms. However, existing techniques are currently still limited in the quality of the try-on results they are able to produce from input images of diverse characteristics. In this work, we propose a Context-Driven Virtual Try-On Network (C-VTON) that addresses these limitations and convincingly transfers selected clothing items to the target subjects even under challenging pose configurations and in the presence of self-occlusions. At the core of the C-VTON pipeline are: (i) a geometric matching procedure that efficiently aligns the target clothing with the pose of the person in the input images, and (ii) a powerful image generator that utilizes various types of contextual information when synthesizing the final try-on result. C-VTON is evaluated in rigorous experiments on the VITON and MPV datasets and in comparison to state-of-the-art techniques from the literature. Experimental results show that the proposed approach is able to produce photo-realistic and visually convincing results and significantly improves on the existing state-of-the-art. |
Stoimchev, Marjan; Ivanovska, Marija; Štruc, Vitomir Learning to Combine Local and Global Image Information for Contactless Palmprint Recognition Journal Article In: Sensors, vol. 22, no. 1, pp. 1-26, 2022. @article{Stoimchev2022,
title = {Learning to Combine Local and Global Image Information for Contactless Palmprint Recognition},
author = {Marjan Stoimchev and Marija Ivanovska and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/03/sensors-22-00073_reduced.pdf},
doi = {https://doi.org/10.3390/s22010073},
year = {2022},
date = {2022-01-01},
journal = {Sensors},
volume = {22},
number = {1},
pages = {1-26},
abstract = {In the past few years, there has been a leap from traditional palmprint recognition methodologies, which use handcrafted features, to deep-learning approaches that are able to automatically learn feature representations from the input data. However, the information that is extracted from such deep-learning models typically corresponds to the global image appearance, where only the most discriminative cues from the input image are considered. This characteristic is especially problematic when data is acquired in unconstrained settings, as in the case of contactless palmprint recognition systems, where visual artifacts caused by elastic deformations of the palmar surface are typically present in spatially local parts of the captured images. In this study we address the problem of elastic deformations by introducing a new approach to contactless palmprint recognition based on a novel CNN model, designed as a two-path architecture, where one path processes the input in a holistic manner, while the second path extracts local information from smaller image patches sampled from the input image. As elastic deformations can be assumed to most significantly affect the global appearance, while having a lesser impact on spatially local image areas, the local processing path addresses the issues related to elastic deformations thereby supplementing the information from the global processing path. The model is trained with a learning objective that combines the Additive Angular Margin (ArcFace) Loss and the well-known center loss. By using the proposed model design, the discriminative power of the learned image representation is significantly enhanced compared to standard holistic models, which, as we show in the experimental section, leads to state-of-the-art performance for contactless palmprint recognition. Our approach is tested on two publicly available contactless palmprint datasets—namely, IITD and CASIA—and is demonstrated to perform favorably against state-of-the-art methods from the literature. The source code for the proposed model is made publicly available.},
keywords = {biometrics; computer vision; deep learning; palmprints},
pubstate = {published},
tppubtype = {article}
}
In the past few years, there has been a leap from traditional palmprint recognition methodologies, which use handcrafted features, to deep-learning approaches that are able to automatically learn feature representations from the input data. However, the information that is extracted from such deep-learning models typically corresponds to the global image appearance, where only the most discriminative cues from the input image are considered. This characteristic is especially problematic when data is acquired in unconstrained settings, as in the case of contactless palmprint recognition systems, where visual artifacts caused by elastic deformations of the palmar surface are typically present in spatially local parts of the captured images. In this study we address the problem of elastic deformations by introducing a new approach to contactless palmprint recognition based on a novel CNN model, designed as a two-path architecture, where one path processes the input in a holistic manner, while the second path extracts local information from smaller image patches sampled from the input image. As elastic deformations can be assumed to most significantly affect the global appearance, while having a lesser impact on spatially local image areas, the local processing path addresses the issues related to elastic deformations thereby supplementing the information from the global processing path. The model is trained with a learning objective that combines the Additive Angular Margin (ArcFace) Loss and the well-known center loss. By using the proposed model design, the discriminative power of the learned image representation is significantly enhanced compared to standard holistic models, which, as we show in the experimental section, leads to state-of-the-art performance for contactless palmprint recognition. Our approach is tested on two publicly available contactless palmprint datasets—namely, IITD and CASIA—and is demonstrated to perform favorably against state-of-the-art methods from the literature. The source code for the proposed model is made publicly available. |
Rot, Peter; Peer, Peter; Štruc, Vitomir Detecting Soft-Biometric Privacy Enhancement Book Section In: Rathgeb, Christian; Tolosana, Ruben; Vera-Rodriguez, Ruben; Busch, Christoph (Ed.): Handbook of Digital Face Manipulation and Detection, 2022. @incollection{RotManipulationBook,
title = {Detecting Soft-Biometric Privacy Enhancement},
author = {Peter Rot and Peter Peer and Vitomir Štruc},
editor = {Christian Rathgeb and Ruben Tolosana and Ruben Vera-Rodriguez and Christoph Busch},
url = {https://link.springer.com/chapter/10.1007/978-3-030-87664-7_18},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-01},
booktitle = {Handbook of Digital Face Manipulation and Detection},
keywords = {biometrics, face, privacy, privacy enhancement, privacy-enhancing techniques, soft biometric privacy},
pubstate = {published},
tppubtype = {incollection}
}
|
Tolosana, Ruben; Rathgeb, Christian; Vera-Rodriguez, Ruben; Busch, Christoph; Verdilova, Luisa; Lyu, Siwei; Nguyen, Huy H.; Yamagishi, Junichi; Echizen, Isao; Rot, Peter; Grm, Klemen; Štruc, Vitomir; Datcheva, Antitza; Akhtar, Zahid; Romero-Tapiador, Sergio; Fierrez, Julian; Morales, Aythami; Ortega-Garcia, Javier; Kindt, Els; Jasserand, Catherine; Kalvet, Tarmo; Tiits, Marek Future Trends in Digital Face Manipulation and Detection Book Section In: Rathgeb, Christian; Tolosana, Ruben; Vera-Rodriguez, Ruben; Busch, Christoph (Ed.): Handbook of Digital Face Manipulation and Detection, pp. 463–482, 2022, ISBN: 978-3-030-87663-0. @incollection{ManipulationFace2022,
title = {Future Trends in Digital Face Manipulation and Detection},
author = {Ruben Tolosana and Christian Rathgeb and Ruben Vera-Rodriguez and Christoph Busch and Luisa Verdilova and Siwei Lyu and Huy H. Nguyen and Junichi Yamagishi and Isao Echizen and Peter Rot and Klemen Grm and Vitomir Štruc and Antitza Datcheva and Zahid Akhtar and Sergio Romero-Tapiador and Julian Fierrez and Aythami Morales and Javier Ortega-Garcia and Els Kindt and Catherine Jasserand and Tarmo Kalvet and Marek Tiits},
editor = {Christian Rathgeb and Ruben Tolosana and Ruben Vera-Rodriguez and Christoph Busch},
url = {https://link.springer.com/chapter/10.1007/978-3-030-87664-7_21},
doi = {https://doi.org/10.1007/978-3-030-87664-7_21},
isbn = {978-3-030-87663-0},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-01},
booktitle = {Handbook of Digital Face Manipulation and Detection},
pages = {463--482},
abstract = {Recently, digital face manipulation and its detection have sparked large interest in industry and academia around the world. Numerous approaches have been proposed in the literature to create realistic face manipulations, such as DeepFakes and face morphs. To the human eye manipulated images and videos can be almost indistinguishable from real content. Although impressive progress has been reported in the automatic detection of such face manipulations, this research field is often considered to be a cat and mouse game. This chapter briefly discusses the state of the art of digital face manipulation and detection. Issues and challenges that need to be tackled by the research community are summarized, along with future trends in the field.},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
Recently, digital face manipulation and its detection have sparked large interest in industry and academia around the world. Numerous approaches have been proposed in the literature to create realistic face manipulations, such as DeepFakes and face morphs. To the human eye manipulated images and videos can be almost indistinguishable from real content. Although impressive progress has been reported in the automatic detection of such face manipulations, this research field is often considered to be a cat and mouse game. This chapter briefly discusses the state of the art of digital face manipulation and detection. Issues and challenges that need to be tackled by the research community are summarized, along with future trends in the field. |
2021
|
Emeršič, Žiga; Sušanj, Diego; Meden, Blaž; Peer, Peter; Štruc, Vitomir ContexedNet : Context-Aware Ear Detection in Unconstrained Settings Journal Article In: IEEE Access, pp. 1–17, 2021, ISSN: 2169-3536. @article{ContexedNet_Emersic_2021,
title = {ContexedNet : Context-Aware Ear Detection in Unconstrained Settings},
author = {Žiga Emeršič and Diego Sušanj and Blaž Meden and Peter Peer and Vitomir Štruc},
editor = {ContexedNet : Context-Aware Ear Detection in Unconstrained Settings},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9583244},
issn = {2169-3536},
year = {2021},
date = {2021-10-20},
urldate = {2021-10-20},
journal = {IEEE Access},
pages = {1--17},
abstract = {Ear detection represents one of the key components of contemporary ear recognition systems. While significant progress has been made in the area of ear detection over recent years, most of the improvements are direct results of advances in the field of visual object detection. Only a limited number of techniques presented in the literature are domain--specific and designed explicitly with ear detection in mind. In this paper, we aim to address this gap and present a novel detection approach that does not rely only on general ear (object) appearance, but also exploits contextual information, i.e., face--part locations, to ensure accurate and robust ear detection with images captured in a wide variety of imaging conditions. The proposed approach is based on a Context--aware Ear Detection Network (ContexedNet) and poses ear detection as a semantic image segmentation problem. ContexedNet consists of two processing paths: 1) a context--provider that extracts probability maps corresponding to the locations of facial parts from the input image, and 2) a dedicated ear segmentation model that integrates the computed probability maps into a context--aware segmentation-based ear detection procedure. ContexedNet is evaluated in rigorous experiments on the AWE and UBEAR datasets and shown to ensure competitive performance when evaluated against state--of--the--art ear detection models from the literature. Additionally, because the proposed contextualization is model agnostic, it can also be utilized with other ear detection techniques to improve performance.},
keywords = {biometrics, contextual information, deep leraning, ear detection, ear recognition, ear segmentation, neural networks, segmentation},
pubstate = {published},
tppubtype = {article}
}
Ear detection represents one of the key components of contemporary ear recognition systems. While significant progress has been made in the area of ear detection over recent years, most of the improvements are direct results of advances in the field of visual object detection. Only a limited number of techniques presented in the literature are domain--specific and designed explicitly with ear detection in mind. In this paper, we aim to address this gap and present a novel detection approach that does not rely only on general ear (object) appearance, but also exploits contextual information, i.e., face--part locations, to ensure accurate and robust ear detection with images captured in a wide variety of imaging conditions. The proposed approach is based on a Context--aware Ear Detection Network (ContexedNet) and poses ear detection as a semantic image segmentation problem. ContexedNet consists of two processing paths: 1) a context--provider that extracts probability maps corresponding to the locations of facial parts from the input image, and 2) a dedicated ear segmentation model that integrates the computed probability maps into a context--aware segmentation-based ear detection procedure. ContexedNet is evaluated in rigorous experiments on the AWE and UBEAR datasets and shown to ensure competitive performance when evaluated against state--of--the--art ear detection models from the literature. Additionally, because the proposed contextualization is model agnostic, it can also be utilized with other ear detection techniques to improve performance. |
Ivanovska, Marija; Štruc, Vitomir A Comparative Study on Discriminative and One--Class Learning Models for Deepfake Detection Proceedings Article In: Proceedings of ERK 2021, pp. 1–4, 2021. @inproceedings{ERK_Marija_2021,
title = {A Comparative Study on Discriminative and One--Class Learning Models for Deepfake Detection},
author = {Marija Ivanovska and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2021/10/ERK_2021__A_Comparative_Study_of_Discriminative_and_One__Class_Learning_Models_for_Deepfake_Detection.pdf},
year = {2021},
date = {2021-09-20},
booktitle = {Proceedings of ERK 2021},
pages = {1--4},
abstract = {Deepfakes or manipulated face images, where a donor's face is swapped with the face of a target person, have gained enormous popularity among the general public recently. With the advancements in artificial intelligence and generative modeling
such images can nowadays be easily generated and used to spread misinformation and harm individuals, businesses or society. As the tools for generating deepfakes are rapidly improving, it is critical for deepfake detection models to be able to recognize advanced, sophisticated data manipulations, including those that have not been seen during training. In this paper, we explore the use of one--class learning models as an alternative to discriminative methods for the detection of deepfakes. We conduct a comparative study with three popular deepfake datasets and investigate the performance of selected (discriminative and one-class) detection models in matched- and cross-dataset experiments. Our results show that disciminative models significantly outperform one-class models when training and testing data come from the same dataset, but degrade considerably when the characteristics of the testing data deviate from the training setting. In such cases, one-class models tend to generalize much better.},
keywords = {biometrics, comparative study, computer vision, deepfake detection, deepfakes, detection, face, one-class learning},
pubstate = {published},
tppubtype = {inproceedings}
}
Deepfakes or manipulated face images, where a donor's face is swapped with the face of a target person, have gained enormous popularity among the general public recently. With the advancements in artificial intelligence and generative modeling
such images can nowadays be easily generated and used to spread misinformation and harm individuals, businesses or society. As the tools for generating deepfakes are rapidly improving, it is critical for deepfake detection models to be able to recognize advanced, sophisticated data manipulations, including those that have not been seen during training. In this paper, we explore the use of one--class learning models as an alternative to discriminative methods for the detection of deepfakes. We conduct a comparative study with three popular deepfake datasets and investigate the performance of selected (discriminative and one-class) detection models in matched- and cross-dataset experiments. Our results show that disciminative models significantly outperform one-class models when training and testing data come from the same dataset, but degrade considerably when the characteristics of the testing data deviate from the training setting. In such cases, one-class models tend to generalize much better. |
Grm, Klemen; Vitomir, Štruc Frequency Band Encoding for Face Super-Resolution Proceedings Article In: Proceedings of ERK 2021, pp. 1-4, 2021. @inproceedings{Grm-SuperResolution_ERK2021,
title = {Frequency Band Encoding for Face Super-Resolution},
author = {Klemen Grm and Štruc Vitomir},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2021/10/SRAE_ERK21.pdf},
year = {2021},
date = {2021-09-10},
booktitle = {Proceedings of ERK 2021},
pages = {1-4},
abstract = {In this paper, we present a novel method for face super-resolution based on an encoder-decoder architecture. Unlike previous approaches, which focused primarily on directly reconstructing the high-resolution face appearance from low-resolution images, our method relies on a multi-stage approach where we learn a face representation in different frequency bands, followed by decoding the representation into a high-resolution image. Using quantitative experiments, we are able to demonstrate that this approach results in better face image reconstruction, as well as aiding in downstream semantic tasks such as face recognition and face verification.},
keywords = {CNN, deep learning, face, face hallucination, frequency encoding, super-resolution},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper, we present a novel method for face super-resolution based on an encoder-decoder architecture. Unlike previous approaches, which focused primarily on directly reconstructing the high-resolution face appearance from low-resolution images, our method relies on a multi-stage approach where we learn a face representation in different frequency bands, followed by decoding the representation into a high-resolution image. Using quantitative experiments, we are able to demonstrate that this approach results in better face image reconstruction, as well as aiding in downstream semantic tasks such as face recognition and face verification. |
Boutros, Fadi; Damer, Naser; Kolf, Jan Niklas; Raja, Kiran; Kirchbuchner, Florian; Ramachandra, Raghavendra; Kuijper, Arjan; Fang, Pengcheng; Zhang, Chao; Wang, Fei; Montero, David; Aginako, Naiara; Sierra, Basilio; Nieto, Marcos; Erakin, Mustafa Ekrem; Demir, Ugur; Ekenel, Hazım Kemal; Kataoka, Asaki; Ichikawa, Kohei; Kubo, Shizuma; Zhang, Jie; He, Mingjie; Han, Dan; Shan, Shiguang; Grm, Klemen; Štruc, Vitomir; Seneviratne, Sachith; Kasthuriarachchi, Nuran; Rasnayaka, Sanka; Neto, Pedro C.; Sequeira, Ana F.; Pinto, Joao Ribeiro; Saffari, Mohsen; Cardoso, Jaime S. MFR 2021: Masked Face Recognition Competition Proceedings Article In: Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2021), 2021. @inproceedings{MFR_IJCB2021,
title = {MFR 2021: Masked Face Recognition Competition},
author = {Fadi Boutros and Naser Damer and Jan Niklas Kolf and Kiran Raja and Florian Kirchbuchner and Raghavendra Ramachandra and Arjan Kuijper and Pengcheng Fang and Chao Zhang and Fei Wang and David Montero and Naiara Aginako and Basilio Sierra and Marcos Nieto and Mustafa Ekrem Erakin and Ugur Demir and Hazım Kemal Ekenel and Asaki Kataoka and Kohei Ichikawa and Shizuma Kubo and Jie Zhang and Mingjie He and Dan Han and Shiguang Shan and Klemen Grm and Vitomir Štruc and Sachith Seneviratne and Nuran Kasthuriarachchi and Sanka Rasnayaka and Pedro C. Neto and Ana F. Sequeira and Joao Ribeiro Pinto and Mohsen Saffari and Jaime S. Cardoso},
url = {https://ieeexplore.ieee.org/iel7/9484326/9484328/09484337.pdf?casa_token=OOL4s274P0YAAAAA:XE7ga2rP_wNom2Zeva75ZwNwN-HKz6kF1HZtkpzrdTdz36eaGcLffWkzOgIe3xU2PqaU30qTLws},
doi = {10.1109/IJCB52358.2021.9484337},
year = {2021},
date = {2021-08-01},
booktitle = {Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2021)},
abstract = {This paper presents a summary of the Masked Face Recognition Competitions (MFR) held within the 2021 International Joint Conference on Biometrics (IJCB 2021). The competition attracted a total of 10 participating teams with valid submissions. The affiliations of these teams are diverse and associated with academia and industry in nine different countries. These teams successfully submitted 18 valid solutions. The competition is designed to motivate solutions aiming at enhancing the face recognition accuracy of masked faces. Moreover, the competition considered the deployability of the proposed solutions by taking the compactness of the face recognition models into account. A private dataset representing a collaborative, multisession, real masked, capture scenario is used to evaluate the submitted solutions. In comparison to one of the topperforming academic face recognition solutions, 10 out of the 18 submitted solutions did score higher masked face verification accuracy.
},
keywords = {biometrics, face recognition, masks},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents a summary of the Masked Face Recognition Competitions (MFR) held within the 2021 International Joint Conference on Biometrics (IJCB 2021). The competition attracted a total of 10 participating teams with valid submissions. The affiliations of these teams are diverse and associated with academia and industry in nine different countries. These teams successfully submitted 18 valid solutions. The competition is designed to motivate solutions aiming at enhancing the face recognition accuracy of masked faces. Moreover, the competition considered the deployability of the proposed solutions by taking the compactness of the face recognition models into account. A private dataset representing a collaborative, multisession, real masked, capture scenario is used to evaluate the submitted solutions. In comparison to one of the topperforming academic face recognition solutions, 10 out of the 18 submitted solutions did score higher masked face verification accuracy.
|
Wang, Caiyong; Wang, Yunlong; Zhang, Kunbo; Muhammad, Jawad; Lu, Tianhao; Zhang, Qi; Tian, Qichuan; He, Zhaofeng; Sun, Zhenan; Zhang, Yiwen; Liu, Tianbao; Yang, Wei; Wu, Dongliang; Liu, Yingfeng; Zhou, Ruiye; Wu, Huihai; Zhang, Hao; Wang, Junbao; Wang, Jiayi; Xiong, Wantong; Shi, Xueyu; Zeng, Shao; Li, Peihua; Sun, Haodong; Wang, Jing; Zhang, Jiale; Wang, Qi; Wu, Huijie; Zhang, Xinhui; Li, Haiqing; Chen, Yu; Chen, Liang; Zhang, Menghan; Sun, Ye; Zhou, Zhiyong; Boutros, Fadi; Damer, Naser; Kuijper, Arjan; Tapia, Juan; Valenzuela, Andres; Busch, Christoph; Gupta, Gourav; Raja, Kiran; Wu, Xi; Li, Xiaojie; Yang, Jingfu; Jing, Hongyan; Wang, Xin; Kong, Bin; Yin, Youbing; Song, Qi; Lyu, Siwei; Hu, Shu; Premk, Leon; Vitek, Matej; Štruc, Vitomir; Peer, Peter; Khiarak, Jalil Nourmohammadi; Jaryani, Farhang; Nasab, Samaneh Salehi; Moafinejad, Seyed Naeim; Amini, Yasin; Noshad, Morteza NIR Iris Challenge Evaluation in Non-cooperative Environments: Segmentation and Localization Proceedings Article In: Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2021), 2021. @inproceedings{NIR_IJCB2021,
title = {NIR Iris Challenge Evaluation in Non-cooperative Environments: Segmentation and Localization},
author = {Caiyong Wang and Yunlong Wang and Kunbo Zhang and Jawad Muhammad and Tianhao Lu and Qi Zhang and Qichuan Tian and Zhaofeng He and Zhenan Sun and Yiwen Zhang and Tianbao Liu and Wei Yang and Dongliang Wu and Yingfeng Liu and Ruiye Zhou and Huihai Wu and Hao Zhang and Junbao Wang and Jiayi Wang and Wantong Xiong and Xueyu Shi and Shao Zeng and Peihua Li and Haodong Sun and Jing Wang and Jiale Zhang and Qi Wang and Huijie Wu and Xinhui Zhang and Haiqing Li and Yu Chen and Liang Chen and Menghan Zhang and Ye Sun and Zhiyong Zhou and Fadi Boutros and Naser Damer and Arjan Kuijper and Juan Tapia and Andres Valenzuela and Christoph Busch and Gourav Gupta and Kiran Raja and Xi Wu and Xiaojie Li and Jingfu Yang and Hongyan Jing and Xin Wang and Bin Kong and Youbing Yin and Qi Song and Siwei Lyu and Shu Hu and Leon Premk and Matej Vitek and Vitomir Štruc and Peter Peer and Jalil Nourmohammadi Khiarak and Farhang Jaryani and Samaneh Salehi Nasab and Seyed Naeim Moafinejad and Yasin Amini and Morteza Noshad},
url = {https://ieeexplore.ieee.org/iel7/9484326/9484328/09484336.pdf?casa_token=FOKx4ltO-hYAAAAA:dCkNHfumDzPGkAipRdbppNWpzAiUYUrJL6OrAjNmimTxUA0Vmx311-3-J3ej7YQc_zONxEO-XKo},
doi = {10.1109/IJCB52358.2021.9484336},
year = {2021},
date = {2021-08-01},
booktitle = {Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2021)},
abstract = {For iris recognition in non-cooperative environments, iris segmentation has been regarded as the first most important challenge still open to the biometric community, affecting all downstream tasks from normalization to recognition. In recent years, deep learning technologies have gained significant popularity among various computer vision tasks and also been introduced in iris biometrics, especially iris segmentation. To investigate recent developments and attract more interest of researchers in the iris segmentation method, we organized the 2021 NIR Iris Challenge Evaluation in Non-cooperative Environments: Segmentation and Localization (NIR-ISL 2021) at the 2021 International Joint Conference on Biometrics (IJCB 2021). The challenge was used as a public platform to assess the performance of iris segmentation and localization methods on Asian and African NIR iris images captured in non-cooperative environments. The three best-performing entries achieved solid and satisfactory iris segmentation and localization results in most cases, and their code and models have been made publicly available for reproducibility research.},
keywords = {biometrics, competition, iris, segmentation},
pubstate = {published},
tppubtype = {inproceedings}
}
For iris recognition in non-cooperative environments, iris segmentation has been regarded as the first most important challenge still open to the biometric community, affecting all downstream tasks from normalization to recognition. In recent years, deep learning technologies have gained significant popularity among various computer vision tasks and also been introduced in iris biometrics, especially iris segmentation. To investigate recent developments and attract more interest of researchers in the iris segmentation method, we organized the 2021 NIR Iris Challenge Evaluation in Non-cooperative Environments: Segmentation and Localization (NIR-ISL 2021) at the 2021 International Joint Conference on Biometrics (IJCB 2021). The challenge was used as a public platform to assess the performance of iris segmentation and localization methods on Asian and African NIR iris images captured in non-cooperative environments. The three best-performing entries achieved solid and satisfactory iris segmentation and localization results in most cases, and their code and models have been made publicly available for reproducibility research. |
Peter Rot Blaz Meden, Philipp Terhorst Privacy-Enhancing Face Biometrics: A Comprehensive Survey Journal Article In: IEEE Transactions on Information Forensics and Security, vol. 16, pp. 4147-4183, 2021. @article{TIFS_PrivacySurveyb,
title = {Privacy-Enhancing Face Biometrics: A Comprehensive Survey},
author = {Blaz Meden, Peter Rot, Philipp Terhorst, Naser Damer, Arjan Kuijper, Walter J. Scheirer, Arun Ross, Peter Peer, Vitomir Struc},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9481149
https://lmi.fe.uni-lj.si/en/visual_privacy_of_faces__a_survey_preprint-compressed/},
doi = {10.1109/TIFS.2021.3096024},
year = {2021},
date = {2021-07-12},
journal = {IEEE Transactions on Information Forensics and Security},
volume = {16},
pages = {4147-4183},
abstract = {Biometric recognition technology has made significant advances over the last decade and is now used across a number of services and applications. However, this widespread deployment has also resulted in privacy concerns and evolving societal expectations about the appropriate use of the technology. For example, the ability to automatically extract age, gender, race, and health cues from biometric data has heightened concerns about privacy leakage. Face recognition technology, in particular, has been in the spotlight, and is now seen by many as posing a considerable risk to personal privacy. In response to these and similar concerns, researchers have intensified efforts towards developing techniques and computational models capable of ensuring privacy to individuals, while still facilitating the utility of face recognition technology in several application scenarios. These efforts have resulted in a multitude of privacy--enhancing techniques that aim at addressing privacy risks originating from biometric systems and providing technological solutions for legislative requirements set forth in privacy laws and regulations, such as GDPR. The goal of this overview paper is to provide a comprehensive introduction into privacy--related research in the area of biometrics and review existing work on textit{Biometric Privacy--Enhancing Techniques} (B--PETs) applied to face biometrics. To make this work useful for as wide of an audience as possible, several key topics are covered as well, including evaluation strategies used with B--PETs, existing datasets, relevant standards, and regulations and critical open issues that will have to be addressed in the future. },
keywords = {biometrics, deidentification, face analysis, face deidentification, face recognition, face verification, FaceGEN, privacy, privacy protection, privacy-enhancing techniques, soft biometric privacy},
pubstate = {published},
tppubtype = {article}
}
Biometric recognition technology has made significant advances over the last decade and is now used across a number of services and applications. However, this widespread deployment has also resulted in privacy concerns and evolving societal expectations about the appropriate use of the technology. For example, the ability to automatically extract age, gender, race, and health cues from biometric data has heightened concerns about privacy leakage. Face recognition technology, in particular, has been in the spotlight, and is now seen by many as posing a considerable risk to personal privacy. In response to these and similar concerns, researchers have intensified efforts towards developing techniques and computational models capable of ensuring privacy to individuals, while still facilitating the utility of face recognition technology in several application scenarios. These efforts have resulted in a multitude of privacy--enhancing techniques that aim at addressing privacy risks originating from biometric systems and providing technological solutions for legislative requirements set forth in privacy laws and regulations, such as GDPR. The goal of this overview paper is to provide a comprehensive introduction into privacy--related research in the area of biometrics and review existing work on textit{Biometric Privacy--Enhancing Techniques} (B--PETs) applied to face biometrics. To make this work useful for as wide of an audience as possible, several key topics are covered as well, including evaluation strategies used with B--PETs, existing datasets, relevant standards, and regulations and critical open issues that will have to be addressed in the future. |
Pevec, Klemen; Grm, Klemen; Štruc, Vitomir Benchmarking Crowd-Counting Techniques across Image Characteristics Journal Article In: Elektorethniski Vestnik, vol. 88, iss. 5, pp. 227-235, 2021. @article{CrowdCountingPevec,
title = {Benchmarking Crowd-Counting Techniques across Image Characteristics},
author = {Klemen Pevec and Klemen Grm and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/10/Pevec.pdf
https://ev.fe.uni-lj.si/5-2021/Pevec.pdf},
year = {2021},
date = {2021-05-01},
journal = {Elektorethniski Vestnik},
volume = {88},
issue = {5},
pages = {227-235},
abstract = {Crowd--counting is a longstanding computer vision used in estimating the crowd sizes for security purposes at public protests in streets, public gatherings, for collecting crowd statistics at airports, malls, concerts, conferences, and other similar venues, and for monitoring people and crowds during public health crises (such as the one caused by COVID-19). Recently, the performance of automated methods for crowd--counting from single images has improved particularly due to the introduction of deep learning techniques and large labelled training datasets. However, the robustness of these methods to varying imaging conditions, such as weather, image perspective, and large variations in the crowd size has not been studied in-depth in the open literature. To address this gap, a systematic study on the robustness of four recently developed crowd--counting methods is performed in this paper to evaluate their performance with respect to variable (real-life) imaging scenarios that include different event types, weather conditions, image sources and crowd sizes. It is shown that the performance of the tested techniques is degraded in unclear weather conditions (i.e., fog, rain, snow) and also on images taken from large distances by drones. On the opposite, clear weather conditions, crowd--counting methods can provide accurate and usable results.},
keywords = {CNN, crowd counting, drones, image characteristics, model comparison, neural networks},
pubstate = {published},
tppubtype = {article}
}
Crowd--counting is a longstanding computer vision used in estimating the crowd sizes for security purposes at public protests in streets, public gatherings, for collecting crowd statistics at airports, malls, concerts, conferences, and other similar venues, and for monitoring people and crowds during public health crises (such as the one caused by COVID-19). Recently, the performance of automated methods for crowd--counting from single images has improved particularly due to the introduction of deep learning techniques and large labelled training datasets. However, the robustness of these methods to varying imaging conditions, such as weather, image perspective, and large variations in the crowd size has not been studied in-depth in the open literature. To address this gap, a systematic study on the robustness of four recently developed crowd--counting methods is performed in this paper to evaluate their performance with respect to variable (real-life) imaging scenarios that include different event types, weather conditions, image sources and crowd sizes. It is shown that the performance of the tested techniques is degraded in unclear weather conditions (i.e., fog, rain, snow) and also on images taken from large distances by drones. On the opposite, clear weather conditions, crowd--counting methods can provide accurate and usable results. |
Batagelj, Borut; Peer, Peter; Štruc, Vitomir; Dobrišek, Simon How to correctly detect face-masks for COVID-19 from visual information? Journal Article In: Applied sciences, vol. 11, no. 5, pp. 1-24, 2021, ISBN: 2076-3417. @article{Batagelj2021,
title = {How to correctly detect face-masks for COVID-19 from visual information?},
author = {Borut Batagelj and Peter Peer and Vitomir Štruc and Simon Dobrišek},
url = {https://www.mdpi.com/2076-3417/11/5/2070/pdf},
doi = {10.3390/app11052070},
isbn = {2076-3417},
year = {2021},
date = {2021-03-01},
urldate = {2021-03-01},
journal = {Applied sciences},
volume = {11},
number = {5},
pages = {1-24},
abstract = {The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). To limit the spread of the disease, mandatory face-mask rules are now becoming common in public settings around the world. Additionally, many public service providers require customers to wear face-masks in accordance with predefined rules (e.g., covering both mouth and nose) when using public services. These developments inspired research into automatic (computer-vision-based) techniques for face-mask detection that can help monitor public behavior and contribute towards constraining the COVID-19 pandemic. Although existing research in this area resulted in efficient techniques for face-mask detection, these usually operate under the assumption that modern face detectors provide perfect detection performance (even for masked faces) and that the main goal of the techniques is to detect the presence of face-masks only. In this study, we revisit these common assumptions and explore the following research questions: (i) How well do existing face detectors perform with masked-face images? (ii) Is it possible to detect a proper (regulation-compliant) placement of facial masks? and (iii) How useful are existing face-mask detection techniques for monitoring applications during the COVID-19 pandemic? To answer these and related questions we conduct a comprehensive experimental evaluation of several recent face detectors for their performance with masked-face images. Furthermore, we investigate the usefulness of multiple off-the-shelf deep-learning models for recognizing correct face-mask placement. Finally, we design a complete pipeline for recognizing whether face-masks are worn correctly or not and compare the performance of the pipeline with standard face-mask detection models from the literature. To facilitate the study, we compile a large dataset of facial images from the publicly available MAFA and Wider Face datasets and annotate it with compliant and non-compliant labels. The annotation dataset, called Face-Mask-Label Dataset (FMLD), is made publicly available to the research community.},
keywords = {computer vision, COVID-19, deep learning, detection, face, mask detection, recognition},
pubstate = {published},
tppubtype = {article}
}
The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). To limit the spread of the disease, mandatory face-mask rules are now becoming common in public settings around the world. Additionally, many public service providers require customers to wear face-masks in accordance with predefined rules (e.g., covering both mouth and nose) when using public services. These developments inspired research into automatic (computer-vision-based) techniques for face-mask detection that can help monitor public behavior and contribute towards constraining the COVID-19 pandemic. Although existing research in this area resulted in efficient techniques for face-mask detection, these usually operate under the assumption that modern face detectors provide perfect detection performance (even for masked faces) and that the main goal of the techniques is to detect the presence of face-masks only. In this study, we revisit these common assumptions and explore the following research questions: (i) How well do existing face detectors perform with masked-face images? (ii) Is it possible to detect a proper (regulation-compliant) placement of facial masks? and (iii) How useful are existing face-mask detection techniques for monitoring applications during the COVID-19 pandemic? To answer these and related questions we conduct a comprehensive experimental evaluation of several recent face detectors for their performance with masked-face images. Furthermore, we investigate the usefulness of multiple off-the-shelf deep-learning models for recognizing correct face-mask placement. Finally, we design a complete pipeline for recognizing whether face-masks are worn correctly or not and compare the performance of the pipeline with standard face-mask detection models from the literature. To facilitate the study, we compile a large dataset of facial images from the publicly available MAFA and Wider Face datasets and annotate it with compliant and non-compliant labels. The annotation dataset, called Face-Mask-Label Dataset (FMLD), is made publicly available to the research community. |
Oblak, Tim; Šircelj, Jaka; Struc, Vitomir; Peer, Peter; Solina, Franc; Jaklic, Aleš Learning to predict superquadric parameters from depth images with explicit and implicit supervision Journal Article In: IEEE Access, pp. 1-16, 2021, ISSN: 2169-3536. @article{Oblak2021,
title = {Learning to predict superquadric parameters from depth images with explicit and implicit supervision},
author = {Tim Oblak and Jaka Šircelj and Vitomir Struc and Peter Peer and Franc Solina and Aleš Jaklic
},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9274424},
doi = {10.1109/ACCESS.2020.3041584},
issn = {2169-3536},
year = {2021},
date = {2021-01-01},
journal = {IEEE Access},
pages = {1-16},
abstract = {Reconstruction of 3D space from visual data has always been a significant challenge in
the field of computer vision. A popular approach to address this problem can be found in the form of
bottom-up reconstruction techniques which try to model complex 3D scenes through a constellation of
volumetric primitives. Such techniques are inspired by the current understanding of the human visual
system and are, therefore, strongly related to the way humans process visual information, as suggested
by recent visual neuroscience literature. While advances have been made in recent years in the area of
3D reconstruction, the problem remains challenging due to the many possible ways of representing 3D
data, the ambiguity of determining the shape and general position in 3D space and the difficulty to train
efficient models for the prediction of volumetric primitives. In this paper, we address these challenges and
present a novel solution for recovering volumetric primitives from depth images. Specifically, we focus on
the recovery of superquadrics, a special type of parametric models able to describe a wide array of 3D
shapes using only a few parameters. We present a new learning objective that relies on the superquadric
(inside-outside) function and develop two learning strategies for training convolutional neural networks
(CNN) capable of predicting superquadric parameters. The first uses explicit supervision and penalizes the
difference between the predicted and reference superquadric parameters. The second strategy uses implicit
supervision and penalizes differences between the input depth images and depth images rendered from
the predicted parameters. CNN predictors for superquadric parameters are trained with both strategies and
evaluated on a large dataset of synthetic and real-world depth images. Experimental results show that both
strategies compare favourably to the existing state-of-the-art and result in high quality 3D reconstructions
of the modelled scenes at a much shorter processing time.},
keywords = {3d, computer vision, depth images, differential renderer, recovery, superquadric},
pubstate = {published},
tppubtype = {article}
}
Reconstruction of 3D space from visual data has always been a significant challenge in
the field of computer vision. A popular approach to address this problem can be found in the form of
bottom-up reconstruction techniques which try to model complex 3D scenes through a constellation of
volumetric primitives. Such techniques are inspired by the current understanding of the human visual
system and are, therefore, strongly related to the way humans process visual information, as suggested
by recent visual neuroscience literature. While advances have been made in recent years in the area of
3D reconstruction, the problem remains challenging due to the many possible ways of representing 3D
data, the ambiguity of determining the shape and general position in 3D space and the difficulty to train
efficient models for the prediction of volumetric primitives. In this paper, we address these challenges and
present a novel solution for recovering volumetric primitives from depth images. Specifically, we focus on
the recovery of superquadrics, a special type of parametric models able to describe a wide array of 3D
shapes using only a few parameters. We present a new learning objective that relies on the superquadric
(inside-outside) function and develop two learning strategies for training convolutional neural networks
(CNN) capable of predicting superquadric parameters. The first uses explicit supervision and penalizes the
difference between the predicted and reference superquadric parameters. The second strategy uses implicit
supervision and penalizes differences between the input depth images and depth images rendered from
the predicted parameters. CNN predictors for superquadric parameters are trained with both strategies and
evaluated on a large dataset of synthetic and real-world depth images. Experimental results show that both
strategies compare favourably to the existing state-of-the-art and result in high quality 3D reconstructions
of the modelled scenes at a much shorter processing time. |
Pernus, Martin; Struc, Vitomir; Dobrisek, Simon High Resolution Face Editing with Masked GAN Latent Code Optimization Journal Article In: CoRR, vol. abs/2103.11135, 2021. @article{DBLP:journals/corr/abs-2103-11135,
title = {High Resolution Face Editing with Masked GAN Latent Code Optimization},
author = {Martin Pernus and Vitomir Struc and Simon Dobrisek},
url = {https://arxiv.org/abs/2103.11135},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
journal = {CoRR},
volume = {abs/2103.11135},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|