2025
|
Batagelj, Borut; Kronovšek, Andrej; Štruc, Vitomir; Peer, Peter Robust cross-dataset deepfake detection with multitask self-supervised learning Journal Article In: ICT Express, pp. 1-5, 2025. @article{DeepFake2025,
title = {Robust cross-dataset deepfake detection with multitask self-supervised learning},
author = {Borut Batagelj and Andrej Kronovšek and Vitomir Štruc and Peter Peer},
url = {https://www.sciencedirect.com/science/article/pii/S240595952500027X?via%3Dihub},
doi = {https://doi.org/10.1016/j.icte.2025.02.011},
year = {2025},
date = {2025-03-02},
journal = {ICT Express},
pages = {1-5},
abstract = {Deepfake detection is increasingly critical due to the rise of manipulated media. Existing methods often require extensive datasets and struggle with interpretability issues. To address these issues, this study introduces a novel one-class approach for detecting and localizing deepfake artifacts in videos, using authentic images to generate manipulated data for training. By integrating segmentation and leveraging convolutional neural networks with visual transformers, the method predicts both the presence and location of the generated manipulations. Experiments on seven deepfake datasets and emerging diffusion-based manipulations show that our approach consistently outperforms existing methods, demonstrating superior accuracy and localization capabilities.},
keywords = {deepfake, deepfake DAD, deepfake detection, multi-task learning, segmentation},
pubstate = {published},
tppubtype = {article}
}
Deepfake detection is increasingly critical due to the rise of manipulated media. Existing methods often require extensive datasets and struggle with interpretability issues. To address these issues, this study introduces a novel one-class approach for detecting and localizing deepfake artifacts in videos, using authentic images to generate manipulated data for training. By integrating segmentation and leveraging convolutional neural networks with visual transformers, the method predicts both the presence and location of the generated manipulations. Experiments on seven deepfake datasets and emerging diffusion-based manipulations show that our approach consistently outperforms existing methods, demonstrating superior accuracy and localization capabilities. |
2023
|
Ivanovska, Marija; Štruc, Vitomir; Perš, Janez TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models Proceedings Article In: 18th International Conference on Machine Vision and Applications (MVA 2023), pp. 1-6, 2023. @inproceedings{MarijaTomato2023,
title = {TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models},
author = {Marija Ivanovska and Vitomir Štruc and Janez Perš },
url = {https://arxiv.org/pdf/2307.01064.pdf
https://ieeexplore.ieee.org/document/10215774},
doi = {10.23919/MVA57639.2023.10215774},
year = {2023},
date = {2023-07-23},
urldate = {2023-07-23},
booktitle = {18th International Conference on Machine Vision and Applications (MVA 2023)},
pages = {1-6},
abstract = {Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates
state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github. com/MIvanovska/TomatoDIFF},
keywords = {agriculture, dataset, deep learning, diffusion, plan segmentation, plant monitoring, robotics, segmentation, tomato dataset},
pubstate = {published},
tppubtype = {inproceedings}
}
Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates
state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github. com/MIvanovska/TomatoDIFF |
Vitek, Matej; Das, Abhijit; Lucio, Diego Rafael; Jr., Luiz Antonio Zanlorensi; Menotti, David; Khiarak, Jalil Nourmohammadi; Shahpar, Mohsen Akbari; Asgari-Chenaghlu, Meysam; Jaryani, Farhang; Tapia, Juan E.; Valenzuela, Andres; Wang, Caiyong; Wang, Yunlong; He, Zhaofeng; Sun, Zhenan; Boutros, Fadi; Damer, Naser; Grebe, Jonas Henry; Kuijper, Arjan; Raja, Kiran; Gupta, Gourav; Zampoukis, Georgios; Tsochatzidis, Lazaros; Pratikakis, Ioannis; Kumar, S. V. Aruna; Harish, B. S.; Pal, Umapada; Peer, Peter; Štruc, Vitomir Exploring Bias in Sclera Segmentation Models: A Group Evaluation Approach Journal Article In: IEEE Transactions on Information Forensics and Security, vol. 18, pp. 190-205, 2023, ISSN: 1556-6013. @article{TIFS_Sclera2022,
title = {Exploring Bias in Sclera Segmentation Models: A Group Evaluation Approach},
author = {Matej Vitek and Abhijit Das and Diego Rafael Lucio and Luiz Antonio Zanlorensi Jr. and David Menotti and Jalil Nourmohammadi Khiarak and Mohsen Akbari Shahpar and Meysam Asgari-Chenaghlu and Farhang Jaryani and Juan E. Tapia and Andres Valenzuela and Caiyong Wang and Yunlong Wang and Zhaofeng He and Zhenan Sun and Fadi Boutros and Naser Damer and Jonas Henry Grebe and Arjan Kuijper and Kiran Raja and Gourav Gupta and Georgios Zampoukis and Lazaros Tsochatzidis and Ioannis Pratikakis and S. V. Aruna Kumar and B. S. Harish and Umapada Pal and Peter Peer and Vitomir Štruc},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9926136},
doi = {10.1109/TIFS.2022.3216468},
issn = {1556-6013},
year = {2023},
date = {2023-01-18},
urldate = {2022-10-18},
journal = {IEEE Transactions on Information Forensics and Security},
volume = {18},
pages = {190-205},
abstract = {Bias and fairness of biometric algorithms have been key topics of research in recent years, mainly due to the societal, legal and ethical implications of potentially unfair decisions made by automated decision-making models. A considerable amount of work has been done on this topic across different biometric modalities, aiming at better understanding the main sources of algorithmic bias or devising mitigation measures. In this work, we contribute to these efforts and present the first study investigating bias and fairness of sclera segmentation models. Although sclera segmentation techniques represent a key component of sclera-based biometric systems with a considerable impact on the overall recognition performance, the presence of different types of biases in sclera segmentation methods is still underexplored. To address this limitation, we describe the results of a group evaluation effort (involving seven research groups), organized to explore the performance of recent sclera segmentation models within a common experimental framework and study performance differences (and bias), originating from various demographic as well as environmental factors. Using five diverse datasets, we analyze seven independently developed sclera segmentation models in different experimental configurations. The results of our experiments suggest that there are significant differences in the overall segmentation performance across the seven models and that among the considered factors, ethnicity appears to be the biggest cause of bias. Additionally, we observe that training with representative and balanced data does not necessarily lead to less biased results. Finally, we find that in general there appears to be a negative correlation between the amount of bias observed (due to eye color, ethnicity and acquisition device) and the overall segmentation performance, suggesting that advances in the field of semantic segmentation may also help with mitigating bias.},
keywords = {bias, biometrics, fairness, group evaluation, ocular, sclera, sclera segmentation, segmentation},
pubstate = {published},
tppubtype = {article}
}
Bias and fairness of biometric algorithms have been key topics of research in recent years, mainly due to the societal, legal and ethical implications of potentially unfair decisions made by automated decision-making models. A considerable amount of work has been done on this topic across different biometric modalities, aiming at better understanding the main sources of algorithmic bias or devising mitigation measures. In this work, we contribute to these efforts and present the first study investigating bias and fairness of sclera segmentation models. Although sclera segmentation techniques represent a key component of sclera-based biometric systems with a considerable impact on the overall recognition performance, the presence of different types of biases in sclera segmentation methods is still underexplored. To address this limitation, we describe the results of a group evaluation effort (involving seven research groups), organized to explore the performance of recent sclera segmentation models within a common experimental framework and study performance differences (and bias), originating from various demographic as well as environmental factors. Using five diverse datasets, we analyze seven independently developed sclera segmentation models in different experimental configurations. The results of our experiments suggest that there are significant differences in the overall segmentation performance across the seven models and that among the considered factors, ethnicity appears to be the biggest cause of bias. Additionally, we observe that training with representative and balanced data does not necessarily lead to less biased results. Finally, we find that in general there appears to be a negative correlation between the amount of bias observed (due to eye color, ethnicity and acquisition device) and the overall segmentation performance, suggesting that advances in the field of semantic segmentation may also help with mitigating bias. |
2022
|
Tomašević, Darian; Peer, Peter; Štruc, Vitomir BiOcularGAN: Bimodal Synthesis and Annotation of Ocular Images Proceedings Article In: IEEE/IAPR International Joint Conference on Biometrics (IJCB 2022) , pp. 1-10, 2022. @inproceedings{TomasevicIJCBBiOcular,
title = {BiOcularGAN: Bimodal Synthesis and Annotation of Ocular Images},
author = {Darian Tomašević and Peter Peer and Vitomir Štruc },
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/12/BiModal_StyleGAN.pdf
https://arxiv.org/pdf/2205.01536.pdf},
year = {2022},
date = {2022-10-20},
urldate = {2022-10-20},
booktitle = {IEEE/IAPR International Joint Conference on Biometrics (IJCB 2022) },
pages = {1-10},
abstract = {Current state-of-the-art segmentation techniques for ocular images are critically dependent on large-scale annotated datasets, which are labor-intensive to gather and often raise privacy concerns. In this paper, we present a novel framework, called BiOcularGAN, capable of generating synthetic large-scale datasets of photorealistic (visible light and near-infrared) ocular images, together with corresponding segmentation labels to address these issues. At its core, the framework relies on a novel Dual-Branch StyleGAN2 (DB-StyleGAN2) model that facilitates bimodal image generation, and a Semantic Mask Generator (SMG) component that produces semantic annotations by exploiting latent features of the DB-StyleGAN2 model. We evaluate BiOcularGAN through extensive experiments across five diverse ocular datasets and analyze the effects of bimodal data generation on image quality and the produced annotations. Our experimental results show that BiOcularGAN is able to produce high-quality matching bimodal images and annotations (with minimal manual intervention) that can be used to train highly competitive (deep) segmentation models (in a privacy aware-manner) that perform well across multiple real-world datasets. The source code for the BiOcularGAN framework is publicly available at: https://github.com/dariant/BiOcularGAN.},
keywords = {biometrics, CNN, data synthesis, deep learning, ocular, segmentation, StyleGAN, synthetic data},
pubstate = {published},
tppubtype = {inproceedings}
}
Current state-of-the-art segmentation techniques for ocular images are critically dependent on large-scale annotated datasets, which are labor-intensive to gather and often raise privacy concerns. In this paper, we present a novel framework, called BiOcularGAN, capable of generating synthetic large-scale datasets of photorealistic (visible light and near-infrared) ocular images, together with corresponding segmentation labels to address these issues. At its core, the framework relies on a novel Dual-Branch StyleGAN2 (DB-StyleGAN2) model that facilitates bimodal image generation, and a Semantic Mask Generator (SMG) component that produces semantic annotations by exploiting latent features of the DB-StyleGAN2 model. We evaluate BiOcularGAN through extensive experiments across five diverse ocular datasets and analyze the effects of bimodal data generation on image quality and the produced annotations. Our experimental results show that BiOcularGAN is able to produce high-quality matching bimodal images and annotations (with minimal manual intervention) that can be used to train highly competitive (deep) segmentation models (in a privacy aware-manner) that perform well across multiple real-world datasets. The source code for the BiOcularGAN framework is publicly available at: https://github.com/dariant/BiOcularGAN. |
Jug, Julijan; Lampe, Ajda; Štruc, Vitomir; Peer, Peter Body Segmentation Using Multi-task Learning Proceedings Article In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, 2022, ISBN: 978-1-6654-5818-4. @inproceedings{JulijanJugBody,
title = {Body Segmentation Using Multi-task Learning},
author = {Julijan Jug and Ajda Lampe and Vitomir Štruc and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/03/ICAIIC_paper.pdf},
doi = {10.1109/ICAIIC54071.2022.9722662},
isbn = {978-1-6654-5818-4},
year = {2022},
date = {2022-01-20},
urldate = {2022-01-20},
booktitle = {International Conference on Artificial Intelligence in Information and Communication (ICAIIC)},
publisher = {IEEE},
abstract = {Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance. },
keywords = {body segmentation, cn, CNN, computer vision, deep beauty, deep learning, multi-task learning, segmentation, virtual try-on},
pubstate = {published},
tppubtype = {inproceedings}
}
Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance. |
2021
|
Emeršič, Žiga; Sušanj, Diego; Meden, Blaž; Peer, Peter; Štruc, Vitomir ContexedNet : Context-Aware Ear Detection in Unconstrained Settings Journal Article In: IEEE Access, pp. 1–17, 2021, ISSN: 2169-3536. @article{ContexedNet_Emersic_2021,
title = {ContexedNet : Context-Aware Ear Detection in Unconstrained Settings},
author = {Žiga Emeršič and Diego Sušanj and Blaž Meden and Peter Peer and Vitomir Štruc},
editor = {ContexedNet : Context-Aware Ear Detection in Unconstrained Settings},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9583244},
issn = {2169-3536},
year = {2021},
date = {2021-10-20},
urldate = {2021-10-20},
journal = {IEEE Access},
pages = {1--17},
abstract = {Ear detection represents one of the key components of contemporary ear recognition systems. While significant progress has been made in the area of ear detection over recent years, most of the improvements are direct results of advances in the field of visual object detection. Only a limited number of techniques presented in the literature are domain--specific and designed explicitly with ear detection in mind. In this paper, we aim to address this gap and present a novel detection approach that does not rely only on general ear (object) appearance, but also exploits contextual information, i.e., face--part locations, to ensure accurate and robust ear detection with images captured in a wide variety of imaging conditions. The proposed approach is based on a Context--aware Ear Detection Network (ContexedNet) and poses ear detection as a semantic image segmentation problem. ContexedNet consists of two processing paths: 1) a context--provider that extracts probability maps corresponding to the locations of facial parts from the input image, and 2) a dedicated ear segmentation model that integrates the computed probability maps into a context--aware segmentation-based ear detection procedure. ContexedNet is evaluated in rigorous experiments on the AWE and UBEAR datasets and shown to ensure competitive performance when evaluated against state--of--the--art ear detection models from the literature. Additionally, because the proposed contextualization is model agnostic, it can also be utilized with other ear detection techniques to improve performance.},
keywords = {biometrics, contextual information, deep leraning, ear detection, ear recognition, ear segmentation, neural networks, segmentation},
pubstate = {published},
tppubtype = {article}
}
Ear detection represents one of the key components of contemporary ear recognition systems. While significant progress has been made in the area of ear detection over recent years, most of the improvements are direct results of advances in the field of visual object detection. Only a limited number of techniques presented in the literature are domain--specific and designed explicitly with ear detection in mind. In this paper, we aim to address this gap and present a novel detection approach that does not rely only on general ear (object) appearance, but also exploits contextual information, i.e., face--part locations, to ensure accurate and robust ear detection with images captured in a wide variety of imaging conditions. The proposed approach is based on a Context--aware Ear Detection Network (ContexedNet) and poses ear detection as a semantic image segmentation problem. ContexedNet consists of two processing paths: 1) a context--provider that extracts probability maps corresponding to the locations of facial parts from the input image, and 2) a dedicated ear segmentation model that integrates the computed probability maps into a context--aware segmentation-based ear detection procedure. ContexedNet is evaluated in rigorous experiments on the AWE and UBEAR datasets and shown to ensure competitive performance when evaluated against state--of--the--art ear detection models from the literature. Additionally, because the proposed contextualization is model agnostic, it can also be utilized with other ear detection techniques to improve performance. |
Wang, Caiyong; Wang, Yunlong; Zhang, Kunbo; Muhammad, Jawad; Lu, Tianhao; Zhang, Qi; Tian, Qichuan; He, Zhaofeng; Sun, Zhenan; Zhang, Yiwen; Liu, Tianbao; Yang, Wei; Wu, Dongliang; Liu, Yingfeng; Zhou, Ruiye; Wu, Huihai; Zhang, Hao; Wang, Junbao; Wang, Jiayi; Xiong, Wantong; Shi, Xueyu; Zeng, Shao; Li, Peihua; Sun, Haodong; Wang, Jing; Zhang, Jiale; Wang, Qi; Wu, Huijie; Zhang, Xinhui; Li, Haiqing; Chen, Yu; Chen, Liang; Zhang, Menghan; Sun, Ye; Zhou, Zhiyong; Boutros, Fadi; Damer, Naser; Kuijper, Arjan; Tapia, Juan; Valenzuela, Andres; Busch, Christoph; Gupta, Gourav; Raja, Kiran; Wu, Xi; Li, Xiaojie; Yang, Jingfu; Jing, Hongyan; Wang, Xin; Kong, Bin; Yin, Youbing; Song, Qi; Lyu, Siwei; Hu, Shu; Premk, Leon; Vitek, Matej; Štruc, Vitomir; Peer, Peter; Khiarak, Jalil Nourmohammadi; Jaryani, Farhang; Nasab, Samaneh Salehi; Moafinejad, Seyed Naeim; Amini, Yasin; Noshad, Morteza NIR Iris Challenge Evaluation in Non-cooperative Environments: Segmentation and Localization Proceedings Article In: Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2021), 2021. @inproceedings{NIR_IJCB2021,
title = {NIR Iris Challenge Evaluation in Non-cooperative Environments: Segmentation and Localization},
author = {Caiyong Wang and Yunlong Wang and Kunbo Zhang and Jawad Muhammad and Tianhao Lu and Qi Zhang and Qichuan Tian and Zhaofeng He and Zhenan Sun and Yiwen Zhang and Tianbao Liu and Wei Yang and Dongliang Wu and Yingfeng Liu and Ruiye Zhou and Huihai Wu and Hao Zhang and Junbao Wang and Jiayi Wang and Wantong Xiong and Xueyu Shi and Shao Zeng and Peihua Li and Haodong Sun and Jing Wang and Jiale Zhang and Qi Wang and Huijie Wu and Xinhui Zhang and Haiqing Li and Yu Chen and Liang Chen and Menghan Zhang and Ye Sun and Zhiyong Zhou and Fadi Boutros and Naser Damer and Arjan Kuijper and Juan Tapia and Andres Valenzuela and Christoph Busch and Gourav Gupta and Kiran Raja and Xi Wu and Xiaojie Li and Jingfu Yang and Hongyan Jing and Xin Wang and Bin Kong and Youbing Yin and Qi Song and Siwei Lyu and Shu Hu and Leon Premk and Matej Vitek and Vitomir Štruc and Peter Peer and Jalil Nourmohammadi Khiarak and Farhang Jaryani and Samaneh Salehi Nasab and Seyed Naeim Moafinejad and Yasin Amini and Morteza Noshad},
url = {https://ieeexplore.ieee.org/iel7/9484326/9484328/09484336.pdf?casa_token=FOKx4ltO-hYAAAAA:dCkNHfumDzPGkAipRdbppNWpzAiUYUrJL6OrAjNmimTxUA0Vmx311-3-J3ej7YQc_zONxEO-XKo},
doi = {10.1109/IJCB52358.2021.9484336},
year = {2021},
date = {2021-08-01},
booktitle = {Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2021)},
abstract = {For iris recognition in non-cooperative environments, iris segmentation has been regarded as the first most important challenge still open to the biometric community, affecting all downstream tasks from normalization to recognition. In recent years, deep learning technologies have gained significant popularity among various computer vision tasks and also been introduced in iris biometrics, especially iris segmentation. To investigate recent developments and attract more interest of researchers in the iris segmentation method, we organized the 2021 NIR Iris Challenge Evaluation in Non-cooperative Environments: Segmentation and Localization (NIR-ISL 2021) at the 2021 International Joint Conference on Biometrics (IJCB 2021). The challenge was used as a public platform to assess the performance of iris segmentation and localization methods on Asian and African NIR iris images captured in non-cooperative environments. The three best-performing entries achieved solid and satisfactory iris segmentation and localization results in most cases, and their code and models have been made publicly available for reproducibility research.},
keywords = {biometrics, competition, iris, segmentation},
pubstate = {published},
tppubtype = {inproceedings}
}
For iris recognition in non-cooperative environments, iris segmentation has been regarded as the first most important challenge still open to the biometric community, affecting all downstream tasks from normalization to recognition. In recent years, deep learning technologies have gained significant popularity among various computer vision tasks and also been introduced in iris biometrics, especially iris segmentation. To investigate recent developments and attract more interest of researchers in the iris segmentation method, we organized the 2021 NIR Iris Challenge Evaluation in Non-cooperative Environments: Segmentation and Localization (NIR-ISL 2021) at the 2021 International Joint Conference on Biometrics (IJCB 2021). The challenge was used as a public platform to assess the performance of iris segmentation and localization methods on Asian and African NIR iris images captured in non-cooperative environments. The three best-performing entries achieved solid and satisfactory iris segmentation and localization results in most cases, and their code and models have been made publicly available for reproducibility research. |
2020
|
Vitek, M.; Das, A.; Pourcenoux, Y.; Missler, A.; Paumier, C.; Das, S.; Ghosh, I. De; Lucio, D. R.; Jr., L. A. Zanlorensi; Menotti, D.; Boutros, F.; Damer, N.; Grebe, J. H.; Kuijper, A.; Hu, J.; He, Y.; Wang, C.; Liu, H.; Wang, Y.; Sun, Z.; Osorio-Roig, D.; Rathgeb, C.; Busch, C.; Tapia, J.; Valenzuela, A.; Zampoukis, G.; Tsochatzidis, L.; Pratikakis, I.; Nathan, S.; Suganya, R.; Mehta, V.; Dhall, A.; Raja, K.; Gupta, G.; Khiarak, J. N.; Akbari-Shahper, M.; Jaryani, F.; Asgari-Chenaghlu, M.; Vyas, R.; Dakshit, S.; Dakshit, S.; Peer, P.; Pal, U.; Štruc, V. SSBC 2020: Sclera Segmentation Benchmarking Competition in the Mobile Environment Proceedings Article In: International Joint Conference on Biometrics (IJCB 2020), pp. 1–10, 2020. @inproceedings{SSBC2020,
title = {SSBC 2020: Sclera Segmentation Benchmarking Competition in the Mobile Environment},
author = {M. Vitek and A. Das and Y. Pourcenoux and A. Missler and C. Paumier and S. Das and I. De Ghosh and D. R. Lucio and L. A. Zanlorensi Jr. and D. Menotti and F. Boutros and N. Damer and J. H. Grebe and A. Kuijper and J. Hu and Y. He and C. Wang and H. Liu and Y. Wang and Z. Sun and D. Osorio-Roig and C. Rathgeb and C. Busch and J. Tapia and A.~Valenzuela and G. Zampoukis and L. Tsochatzidis and I. Pratikakis and S. Nathan and R. Suganya and V. Mehta and A. Dhall and K. Raja and G. Gupta and J. N. Khiarak and M. Akbari-Shahper and F. Jaryani and M. Asgari-Chenaghlu and R. Vyas and S. Dakshit and S. Dakshit and P. Peer and U. Pal and V. Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2020/11/IJCB_SSBC_2020.pdf},
year = {2020},
date = {2020-09-28},
booktitle = {International Joint Conference on Biometrics (IJCB 2020)},
pages = {1--10},
abstract = {The paper presents a summary of the 2020 Sclera Segmentation Benchmarking Competition (SSBC), the 7th in the series of group benchmarking efforts centred around the problem of sclera segmentation. Different from previous editions, the goal of SSBC 2020 was to evaluate the performance of sclera-segmentation models on images captured with mobile devices. The competition was used as a platform to assess the sensitivity of existing models to i) differences in mobile devices used for image capture and ii) changes in the ambient acquisition conditions. 26 research groups registered for SSBC 2020, out of which 13 took part in the final round and submitted a total of 16 segmentation models for scoring. These included a wide variety of deep-learning solutions as well as one approach based on standard image processing techniques. Experiments were conducted with three recent datasets. Most of the segmentation models achieved relatively consistent performance across images captured with different mobile devices (with slight differences across devices), but struggled most with low-quality images captured in challenging ambient conditions, i.e., in an indoor environment and with poor lighting. },
keywords = {biometrics, competition IJCB, ocular, sclera, segmentation, SSBC},
pubstate = {published},
tppubtype = {inproceedings}
}
The paper presents a summary of the 2020 Sclera Segmentation Benchmarking Competition (SSBC), the 7th in the series of group benchmarking efforts centred around the problem of sclera segmentation. Different from previous editions, the goal of SSBC 2020 was to evaluate the performance of sclera-segmentation models on images captured with mobile devices. The competition was used as a platform to assess the sensitivity of existing models to i) differences in mobile devices used for image capture and ii) changes in the ambient acquisition conditions. 26 research groups registered for SSBC 2020, out of which 13 took part in the final round and submitted a total of 16 segmentation models for scoring. These included a wide variety of deep-learning solutions as well as one approach based on standard image processing techniques. Experiments were conducted with three recent datasets. Most of the segmentation models achieved relatively consistent performance across images captured with different mobile devices (with slight differences across devices), but struggled most with low-quality images captured in challenging ambient conditions, i.e., in an indoor environment and with poor lighting. |
Šircelj, Jaka; Oblak, Tim; Grm, Klemen; Petković, Uroš; Jaklič, Aleš; Peer, Peter; Štruc, Vitomir; Solina, Franc Segmentation and Recovery of Superquadric Models using Convolutional Neural Networks Proceedings Article In: 25th Computer Vision Winter Workshop (CVWW 2020), 2020. @inproceedings{sircelj2020sqcnn,
title = {Segmentation and Recovery of Superquadric Models using Convolutional Neural Networks},
author = {Jaka Šircelj and Tim Oblak and Klemen Grm and Uroš Petković and Aleš Jaklič and Peter Peer and Vitomir Štruc and Franc Solina},
url = {https://lmi.fe.uni-lj.si/en/sircelj2020cvww/
https://arxiv.org/abs/2001.10504},
year = {2020},
date = {2020-02-03},
booktitle = {25th Computer Vision Winter Workshop (CVWW 2020)},
abstract = {In this paper we address the problem of representing 3D visual data with parameterized volumetric shape primitives. Specifically, we present a (two-stage) approach built around convolutional neural networks (CNNs) capable of segmenting complex depth scenes into the simpler geometric structures that can be represented with superquadric models. In the first stage, our approach uses a Mask RCNN model to identify superquadric-like structures in depth scenes and then fits superquadric models to the segmented structures using a specially designed CNN regressor. Using our approach we are able to describe complex structures with a small number of interpretable parameters. We evaluated the proposed approach on synthetic as well as real-world depth data and show that our solution does not only result in competitive performance in comparison to the state-of-the-art, but is able to decompose scenes into a number of superquadric models at a fraction of the time required by competing approaches. We make all data and models used in the paper available from https://lmi.fe.uni-lj.si/en/research/resources/sq-seg.},
keywords = {CNN, convolutional neural networks, segmentation, superquadrics, volumetric data},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we address the problem of representing 3D visual data with parameterized volumetric shape primitives. Specifically, we present a (two-stage) approach built around convolutional neural networks (CNNs) capable of segmenting complex depth scenes into the simpler geometric structures that can be represented with superquadric models. In the first stage, our approach uses a Mask RCNN model to identify superquadric-like structures in depth scenes and then fits superquadric models to the segmented structures using a specially designed CNN regressor. Using our approach we are able to describe complex structures with a small number of interpretable parameters. We evaluated the proposed approach on synthetic as well as real-world depth data and show that our solution does not only result in competitive performance in comparison to the state-of-the-art, but is able to decompose scenes into a number of superquadric models at a fraction of the time required by competing approaches. We make all data and models used in the paper available from https://lmi.fe.uni-lj.si/en/research/resources/sq-seg. |
Vitek, Matej; Rot, Peter; Struc, Vitomir; Peer, Peter A comprehensive investigation into sclera biometrics: a novel dataset and performance study Journal Article In: Neural Computing and Applications, pp. 1-15, 2020. @article{vitek2020comprehensive,
title = {A comprehensive investigation into sclera biometrics: a novel dataset and performance study},
author = {Matej Vitek and Peter Rot and Vitomir Struc and Peter Peer},
url = {https://link.springer.com/epdf/10.1007/s00521-020-04782-1},
doi = {https://doi.org/10.1007/s00521-020-04782-1},
year = {2020},
date = {2020-01-01},
journal = {Neural Computing and Applications},
pages = {1-15},
abstract = {The area of ocular biometrics is among the most popular branches of biometric recognition technology. This area has long been dominated by iris recognition research, while other ocular modalities such as the periocular region or the vasculature of the sclera have received significantly less attention in the literature. Consequently, ocular modalities beyond the iris are not well studied and their characteristics are today still not as well understood. While recent needs for more secure authentication schemes have considerably increased the interest in competing ocular modalities, progress in these areas is still held back by the lack of publicly available datasets that would allow for more targeted research into specific ocular characteristics next to the iris. In this paper, we aim to bridge this gap for the case of sclera biometrics and introduce a novel dataset designed for research into ocular biometrics and most importantly for research into the vasculature of the sclera. Our dataset, called Sclera Blood Vessels, Periocular and Iris (SBVPI), is, to the best of our knowledge, the first publicly available dataset designed specifically with research in sclera biometrics in mind. The dataset contains high-quality RGB ocular images, captured in the visible spectrum, belonging to 55 subjects. Unlike competing datasets, it comes with manual markups of various eye regions, such as the iris, pupil, canthus or eyelashes and a detailed pixel-wise annotation of the complete sclera vasculature for a subset of the images. Additionally, the datasets ship with gender and age labels. The unique characteristics of the dataset allow us to study aspects of sclera biometrics technology that have not been studied before in the literature (e.g. vasculature segmentation techniques) as well as issues that are of key importance for practical recognition systems. Thus, next to the SBVPI dataset we also present in this paper a comprehensive investigation into sclera biometrics and the main covariates that affect the performance of sclera segmentation and recognition techniques, such as gender, age, gaze direction or image resolution. Our experiments not only demonstrate the usefulness of the newly introduced dataset, but also contribute to a better understanding of sclera biometrics in general.},
keywords = {biometrics, CNN, dataset, multi-view, ocular, performance study, recognition, sclera, segmentation, visible light},
pubstate = {published},
tppubtype = {article}
}
The area of ocular biometrics is among the most popular branches of biometric recognition technology. This area has long been dominated by iris recognition research, while other ocular modalities such as the periocular region or the vasculature of the sclera have received significantly less attention in the literature. Consequently, ocular modalities beyond the iris are not well studied and their characteristics are today still not as well understood. While recent needs for more secure authentication schemes have considerably increased the interest in competing ocular modalities, progress in these areas is still held back by the lack of publicly available datasets that would allow for more targeted research into specific ocular characteristics next to the iris. In this paper, we aim to bridge this gap for the case of sclera biometrics and introduce a novel dataset designed for research into ocular biometrics and most importantly for research into the vasculature of the sclera. Our dataset, called Sclera Blood Vessels, Periocular and Iris (SBVPI), is, to the best of our knowledge, the first publicly available dataset designed specifically with research in sclera biometrics in mind. The dataset contains high-quality RGB ocular images, captured in the visible spectrum, belonging to 55 subjects. Unlike competing datasets, it comes with manual markups of various eye regions, such as the iris, pupil, canthus or eyelashes and a detailed pixel-wise annotation of the complete sclera vasculature for a subset of the images. Additionally, the datasets ship with gender and age labels. The unique characteristics of the dataset allow us to study aspects of sclera biometrics technology that have not been studied before in the literature (e.g. vasculature segmentation techniques) as well as issues that are of key importance for practical recognition systems. Thus, next to the SBVPI dataset we also present in this paper a comprehensive investigation into sclera biometrics and the main covariates that affect the performance of sclera segmentation and recognition techniques, such as gender, age, gaze direction or image resolution. Our experiments not only demonstrate the usefulness of the newly introduced dataset, but also contribute to a better understanding of sclera biometrics in general. |
2019
|
Rot, Peter; Vitek, Matej; Grm, Klemen; Emeršič, Žiga; Peer, Peter; Štruc, Vitomir Deep Sclera Segmentation and Recognition Book Section In: Uhl, Andreas; Busch, Christoph; Marcel, Sebastien; Veldhuis, Rainer (Ed.): Handbook of Vascular Biometrics, pp. 395-432, Springer, 2019, ISBN: 978-3-030-27731-4. @incollection{ScleraNetChapter,
title = {Deep Sclera Segmentation and Recognition},
author = {Peter Rot and Matej Vitek and Klemen Grm and Žiga Emeršič and Peter Peer
and Vitomir Štruc},
editor = {Andreas Uhl and Christoph Busch and Sebastien Marcel and Rainer Veldhuis},
url = {https://link.springer.com/content/pdf/10.1007%2F978-3-030-27731-4_13.pdf},
doi = {https://doi.org/10.1007/978-3-030-27731-4_13},
isbn = {978-3-030-27731-4},
year = {2019},
date = {2019-11-14},
booktitle = {Handbook of Vascular Biometrics},
pages = {395-432},
publisher = {Springer},
chapter = {13},
series = {Advances in Computer Vision and Pattern Recognition},
abstract = {In this chapter, we address the problem of biometric identity recognition from the vasculature of the human sclera. Specifically, we focus on the challenging task of multi-view sclera recognition, where the visible part of the sclera vasculature changes from image to image due to varying gaze (or view) directions. We propose a complete solution for this task built around Convolutional Neural Networks (CNNs) and make several contributions that result in state-of-the-art recognition performance, i.e.: (i) we develop a cascaded CNN assembly that is able to robustly segment the sclera vasculature from the input images regardless of gaze direction, and (ii) we present ScleraNET, a CNN model trained in a multi-task manner (combining losses pertaining to identity and view-direction recognition) that allows for the extraction of discriminative vasculature descriptors that can be used for identity inference. To evaluate the proposed contributions, we also introduce a new dataset of ocular images, called the Sclera Blood Vessels, Periocular and Iris (SBVPI) dataset, which represents one of the few publicly available datasets suitable for research in multi-view sclera segmentation and recognition. The datasets come with a rich set of annotations, such as a per-pixel markup of various eye parts (including the sclera vasculature), identity, gaze-direction and gender labels. We conduct rigorous experiments on SBVPI with competing techniques from the literature and show that the combination of the proposed segmentation and descriptor-computation models results in highly competitive recognition performance.},
keywords = {biometrics, CNN, deep learning, ocular, sclera, segmentation, vasculature},
pubstate = {published},
tppubtype = {incollection}
}
In this chapter, we address the problem of biometric identity recognition from the vasculature of the human sclera. Specifically, we focus on the challenging task of multi-view sclera recognition, where the visible part of the sclera vasculature changes from image to image due to varying gaze (or view) directions. We propose a complete solution for this task built around Convolutional Neural Networks (CNNs) and make several contributions that result in state-of-the-art recognition performance, i.e.: (i) we develop a cascaded CNN assembly that is able to robustly segment the sclera vasculature from the input images regardless of gaze direction, and (ii) we present ScleraNET, a CNN model trained in a multi-task manner (combining losses pertaining to identity and view-direction recognition) that allows for the extraction of discriminative vasculature descriptors that can be used for identity inference. To evaluate the proposed contributions, we also introduce a new dataset of ocular images, called the Sclera Blood Vessels, Periocular and Iris (SBVPI) dataset, which represents one of the few publicly available datasets suitable for research in multi-view sclera segmentation and recognition. The datasets come with a rich set of annotations, such as a per-pixel markup of various eye parts (including the sclera vasculature), identity, gaze-direction and gender labels. We conduct rigorous experiments on SBVPI with competing techniques from the literature and show that the combination of the proposed segmentation and descriptor-computation models results in highly competitive recognition performance. |
Lozej, Juš; Štepec, Dejan; Štruc, Vitomir; Peer, Peter Influence of segmentation on deep iris recognition performance Proceedings Article In: 7th IAPR/IEEE International Workshop on Biometrics and Forensics (IWBF 2019), 2019. @inproceedings{lozej2019influence,
title = {Influence of segmentation on deep iris recognition performance},
author = {Juš Lozej and Dejan Štepec and Vitomir Štruc and Peter Peer},
url = {https://arxiv.org/pdf/1901.10431.pdf},
year = {2019},
date = {2019-03-01},
booktitle = {7th IAPR/IEEE International Workshop on Biometrics and Forensics (IWBF 2019)},
journal = {arXiv preprint arXiv:1901.10431},
abstract = {Despite the rise of deep learning in numerous areas of computer vision and image processing, iris recognition has not benefited considerably from these trends so far. Most of the existing research on deep iris recognition is focused on new models for generating discriminative and robust iris representations and relies on methodologies akin to traditional iris recognition pipelines. Hence, the proposed models do not approach iris recognition in an end-to-end manner, but rather use standard heuristic iris segmentation (and unwrapping) techniques to produce normalized inputs for the deep learning models. However, because deep learning is able to model very complex data distributions and nonlinear data changes, an obvious question arises. How important is the use of traditional segmentation methods in a deep learning setting? To answer this question, we present in this paper an empirical analysis of the impact of iris segmentation on the performance of deep learning models using a simple two stage pipeline consisting of a segmentation and a recognition step. We evaluate how the accuracy of segmentation influences recognition performance but also examine if segmentation is needed at all. We use the CASIA Thousand and SBVPI datasets for the experiments and report several interesting findings.},
keywords = {biometrics, iris, ocular, segmentation},
pubstate = {published},
tppubtype = {inproceedings}
}
Despite the rise of deep learning in numerous areas of computer vision and image processing, iris recognition has not benefited considerably from these trends so far. Most of the existing research on deep iris recognition is focused on new models for generating discriminative and robust iris representations and relies on methodologies akin to traditional iris recognition pipelines. Hence, the proposed models do not approach iris recognition in an end-to-end manner, but rather use standard heuristic iris segmentation (and unwrapping) techniques to produce normalized inputs for the deep learning models. However, because deep learning is able to model very complex data distributions and nonlinear data changes, an obvious question arises. How important is the use of traditional segmentation methods in a deep learning setting? To answer this question, we present in this paper an empirical analysis of the impact of iris segmentation on the performance of deep learning models using a simple two stage pipeline consisting of a segmentation and a recognition step. We evaluate how the accuracy of segmentation influences recognition performance but also examine if segmentation is needed at all. We use the CASIA Thousand and SBVPI datasets for the experiments and report several interesting findings. |
2018
|
Rot, Peter; Emeršič, Žiga; Struc, Vitomir; Peer, Peter Deep multi-class eye segmentation for ocular biometrics Proceedings Article In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–8, IEEE 2018. @inproceedings{rot2018deep,
title = {Deep multi-class eye segmentation for ocular biometrics},
author = {Peter Rot and Žiga Emeršič and Vitomir Struc and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/MultiClassReduced.pdf},
year = {2018},
date = {2018-07-01},
booktitle = {2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI)},
pages = {1--8},
organization = {IEEE},
abstract = {Segmentation techniques for ocular biometrics typically focus on finding a single eye region in the input image at the time. Only limited work has been done on multi-class eye segmentation despite a number of obvious advantages. In this paper we address this gap and present a deep multi-class eye segmentation model build around the SegNet architecture. We train the model on a small dataset (of 120 samples) of eye images and observe it to generalize well to unseen images and to ensure highly accurate segmentation results. We evaluate the model on the Multi-Angle Sclera Database (MASD) dataset and describe comprehensive experiments focusing on: i) segmentation performance, ii) error analysis, iii) the sensitivity of the model to changes in view direction, and iv) comparisons with competing single-class techniques. Our results show that the proposed model is viable solution for multi-class eye segmentation suitable for recognition (multi-biometric) pipelines based on ocular characteristics.},
keywords = {biometrics, eye, ocular, sclera, segmentation},
pubstate = {published},
tppubtype = {inproceedings}
}
Segmentation techniques for ocular biometrics typically focus on finding a single eye region in the input image at the time. Only limited work has been done on multi-class eye segmentation despite a number of obvious advantages. In this paper we address this gap and present a deep multi-class eye segmentation model build around the SegNet architecture. We train the model on a small dataset (of 120 samples) of eye images and observe it to generalize well to unseen images and to ensure highly accurate segmentation results. We evaluate the model on the Multi-Angle Sclera Database (MASD) dataset and describe comprehensive experiments focusing on: i) segmentation performance, ii) error analysis, iii) the sensitivity of the model to changes in view direction, and iv) comparisons with competing single-class techniques. Our results show that the proposed model is viable solution for multi-class eye segmentation suitable for recognition (multi-biometric) pipelines based on ocular characteristics. |
Emeršič, Žiga; Gabriel, Luka; Štruc, Vitomir; Peer, Peter Convolutional encoder--decoder networks for pixel-wise ear detection and segmentation Journal Article In: IET Biometrics, vol. 7, no. 3, pp. 175–184, 2018. @article{emervsivc2018convolutional,
title = {Convolutional encoder--decoder networks for pixel-wise ear detection and segmentation},
author = {Žiga Emeršič and Luka Gabriel and Vitomir Štruc and Peter Peer},
url = {https://arxiv.org/pdf/1702.00307.pdf},
year = {2018},
date = {2018-03-01},
journal = {IET Biometrics},
volume = {7},
number = {3},
pages = {175--184},
publisher = {IET},
abstract = {Object detection and segmentation represents the basis for many tasks in computer and machine vision. In biometric recognition systems the detection of the region-of-interest (ROI) is one of the most crucial steps in the processing pipeline, significantly impacting the performance of the entire recognition system. Existing approaches to ear detection, are commonly susceptible to the presence of severe occlusions, ear accessories or variable illumination conditions and often deteriorate in their performance if applied on ear images captured in unconstrained settings. To address these shortcomings, we present a novel ear detection technique based on convolutional encoder-decoder networks (CEDs). We formulate the problem of ear detection as a two-class segmentation problem and design and train a CED-network architecture to distinguish between image-pixels belonging to the ear and the non-ear class. Unlike competing techniques, our approach does not simply return a bounding box around the detected ear, but provides detailed, pixel-wise information about the location of the ears in the image. Experiments on a dataset gathered from the web (a.k.a. in the wild) show that the proposed technique ensures good detection results in the presence of various covariate factors and significantly outperforms competing methods from the literature.},
keywords = {annotated web ears, AWE, biometrics, ear, ear detection, pixel-wise detection, segmentation},
pubstate = {published},
tppubtype = {article}
}
Object detection and segmentation represents the basis for many tasks in computer and machine vision. In biometric recognition systems the detection of the region-of-interest (ROI) is one of the most crucial steps in the processing pipeline, significantly impacting the performance of the entire recognition system. Existing approaches to ear detection, are commonly susceptible to the presence of severe occlusions, ear accessories or variable illumination conditions and often deteriorate in their performance if applied on ear images captured in unconstrained settings. To address these shortcomings, we present a novel ear detection technique based on convolutional encoder-decoder networks (CEDs). We formulate the problem of ear detection as a two-class segmentation problem and design and train a CED-network architecture to distinguish between image-pixels belonging to the ear and the non-ear class. Unlike competing techniques, our approach does not simply return a bounding box around the detected ear, but provides detailed, pixel-wise information about the location of the ears in the image. Experiments on a dataset gathered from the web (a.k.a. in the wild) show that the proposed technique ensures good detection results in the presence of various covariate factors and significantly outperforms competing methods from the literature. |