Jug, Julijan; Lampe, Ajda; Štruc, Vitomir; Peer, Peter
Body Segmentation Using Multi-task Learning Inproceedings
In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, 2022, ISBN: 978-1-6654-5818-4.
Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance.
Emeršič, Žiga; Sušanj, Diego; Meden, Blaž; Peer, Peter; Štruc, Vitomir
In: IEEE Access, pp. 1–17, 2021, ISSN: 2169-3536.
Ear detection represents one of the key components of contemporary ear recognition systems. While significant progress has been made in the area of ear detection over recent years, most of the improvements are direct results of advances in the field of visual object detection. Only a limited number of techniques presented in the literature are domain--specific and designed explicitly with ear detection in mind. In this paper, we aim to address this gap and present a novel detection approach that does not rely only on general ear (object) appearance, but also exploits contextual information, i.e., face--part locations, to ensure accurate and robust ear detection with images captured in a wide variety of imaging conditions. The proposed approach is based on a Context--aware Ear Detection Network (ContexedNet) and poses ear detection as a semantic image segmentation problem. ContexedNet consists of two processing paths: 1) a context--provider that extracts probability maps corresponding to the locations of facial parts from the input image, and 2) a dedicated ear segmentation model that integrates the computed probability maps into a context--aware segmentation-based ear detection procedure. ContexedNet is evaluated in rigorous experiments on the AWE and UBEAR datasets and shown to ensure competitive performance when evaluated against state--of--the--art ear detection models from the literature. Additionally, because the proposed contextualization is model agnostic, it can also be utilized with other ear detection techniques to improve performance.
Wang, Caiyong; Wang, Yunlong; Zhang, Kunbo; Muhammad, Jawad; Lu, Tianhao; Zhang, Qi; Tian, Qichuan; He, Zhaofeng; Sun, Zhenan; Zhang, Yiwen; Liu, Tianbao; Yang, Wei; Wu, Dongliang; Liu, Yingfeng; Zhou, Ruiye; Wu, Huihai; Zhang, Hao; Wang, Junbao; Wang, Jiayi; Xiong, Wantong; Shi, Xueyu; Zeng, Shao; Li, Peihua; Sun, Haodong; Wang, Jing; Zhang, Jiale; Wang, Qi; Wu, Huijie; Zhang, Xinhui; Li, Haiqing; Chen, Yu; Chen, Liang; Zhang, Menghan; Sun, Ye; Zhou, Zhiyong; Boutros, Fadi; Damer, Naser; Kuijper, Arjan; Tapia, Juan; Valenzuela, Andres; Busch, Christoph; Gupta, Gourav; Raja, Kiran; Wu, Xi; Li, Xiaojie; Yang, Jingfu; Jing, Hongyan; Wang, Xin; Kong, Bin; Yin, Youbing; Song, Qi; Lyu, Siwei; Hu, Shu; Premk, Leon; Vitek, Matej; Štruc, Vitomir; Peer, Peter; Khiarak, Jalil Nourmohammadi; Jaryani, Farhang; Nasab, Samaneh Salehi; Moafinejad, Seyed Naeim; Amini, Yasin; Noshad, Morteza
In: Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2021), 2021.
For iris recognition in non-cooperative environments, iris segmentation has been regarded as the first most important challenge still open to the biometric community, affecting all downstream tasks from normalization to recognition. In recent years, deep learning technologies have gained significant popularity among various computer vision tasks and also been introduced in iris biometrics, especially iris segmentation. To investigate recent developments and attract more interest of researchers in the iris segmentation method, we organized the 2021 NIR Iris Challenge Evaluation in Non-cooperative Environments: Segmentation and Localization (NIR-ISL 2021) at the 2021 International Joint Conference on Biometrics (IJCB 2021). The challenge was used as a public platform to assess the performance of iris segmentation and localization methods on Asian and African NIR iris images captured in non-cooperative environments. The three best-performing entries achieved solid and satisfactory iris segmentation and localization results in most cases, and their code and models have been made publicly available for reproducibility research.
Vitek, M.; Das, A.; Pourcenoux, Y.; Missler, A.; Paumier, C.; Das, S.; Ghosh, I. De; Lucio, D. R.; Jr., L. A. Zanlorensi; Menotti, D.; Boutros, F.; Damer, N.; Grebe, J. H.; Kuijper, A.; Hu, J.; He, Y.; Wang, C.; Liu, H.; Wang, Y.; Sun, Z.; Osorio-Roig, D.; Rathgeb, C.; Busch, C.; Tapia, J.; Valenzuela, A.; Zampoukis, G.; Tsochatzidis, L.; Pratikakis, I.; Nathan, S.; Suganya, R.; Mehta, V.; Dhall, A.; Raja, K.; Gupta, G.; Khiarak, J. N.; Akbari-Shahper, M.; Jaryani, F.; Asgari-Chenaghlu, M.; Vyas, R.; Dakshit, S.; Dakshit, S.; Peer, P.; Pal, U.; Štruc, V.
In: International Joint Conference on Biometrics (IJCB 2020), pp. 1–10, 2020.
The paper presents a summary of the 2020 Sclera Segmentation Benchmarking Competition (SSBC), the 7th in the series of group benchmarking efforts centred around the problem of sclera segmentation. Different from previous editions, the goal of SSBC 2020 was to evaluate the performance of sclera-segmentation models on images captured with mobile devices. The competition was used as a platform to assess the sensitivity of existing models to i) differences in mobile devices used for image capture and ii) changes in the ambient acquisition conditions. 26 research groups registered for SSBC 2020, out of which 13 took part in the final round and submitted a total of 16 segmentation models for scoring. These included a wide variety of deep-learning solutions as well as one approach based on standard image processing techniques. Experiments were conducted with three recent datasets. Most of the segmentation models achieved relatively consistent performance across images captured with different mobile devices (with slight differences across devices), but struggled most with low-quality images captured in challenging ambient conditions, i.e., in an indoor environment and with poor lighting.
Šircelj, Jaka; Oblak, Tim; Grm, Klemen; Petković, Uroš; Jaklič, Aleš; Peer, Peter; Štruc, Vitomir; Solina, Franc
In: 25th Computer Vision Winter Workshop (CVWW 2020), 2020.
In this paper we address the problem of representing 3D visual data with parameterized volumetric shape primitives. Specifically, we present a (two-stage) approach built around convolutional neural networks (CNNs) capable of segmenting complex depth scenes into the simpler geometric structures that can be represented with superquadric models. In the first stage, our approach uses a Mask RCNN model to identify superquadric-like structures in depth scenes and then fits superquadric models to the segmented structures using a specially designed CNN regressor. Using our approach we are able to describe complex structures with a small number of interpretable parameters. We evaluated the proposed approach on synthetic as well as real-world depth data and show that our solution does not only result in competitive performance in comparison to the state-of-the-art, but is able to decompose scenes into a number of superquadric models at a fraction of the time required by competing approaches. We make all data and models used in the paper available from https://lmi.fe.uni-lj.si/en/research/resources/sq-seg.
Vitek, Matej; Rot, Peter; Struc, Vitomir; Peer, Peter
In: Neural Computing and Applications, pp. 1-15, 2020.
The area of ocular biometrics is among the most popular branches of biometric recognition technology. This area has long been dominated by iris recognition research, while other ocular modalities such as the periocular region or the vasculature of the sclera have received significantly less attention in the literature. Consequently, ocular modalities beyond the iris are not well studied and their characteristics are today still not as well understood. While recent needs for more secure authentication schemes have considerably increased the interest in competing ocular modalities, progress in these areas is still held back by the lack of publicly available datasets that would allow for more targeted research into specific ocular characteristics next to the iris. In this paper, we aim to bridge this gap for the case of sclera biometrics and introduce a novel dataset designed for research into ocular biometrics and most importantly for research into the vasculature of the sclera. Our dataset, called Sclera Blood Vessels, Periocular and Iris (SBVPI), is, to the best of our knowledge, the first publicly available dataset designed specifically with research in sclera biometrics in mind. The dataset contains high-quality RGB ocular images, captured in the visible spectrum, belonging to 55 subjects. Unlike competing datasets, it comes with manual markups of various eye regions, such as the iris, pupil, canthus or eyelashes and a detailed pixel-wise annotation of the complete sclera vasculature for a subset of the images. Additionally, the datasets ship with gender and age labels. The unique characteristics of the dataset allow us to study aspects of sclera biometrics technology that have not been studied before in the literature (e.g. vasculature segmentation techniques) as well as issues that are of key importance for practical recognition systems. Thus, next to the SBVPI dataset we also present in this paper a comprehensive investigation into sclera biometrics and the main covariates that affect the performance of sclera segmentation and recognition techniques, such as gender, age, gaze direction or image resolution. Our experiments not only demonstrate the usefulness of the newly introduced dataset, but also contribute to a better understanding of sclera biometrics in general.
Rot, Peter; Vitek, Matej; Grm, Klemen; Emeršič, Žiga; Peer, Peter; Štruc, Vitomir
Deep Sclera Segmentation and Recognition Incollection
In: Uhl, Andreas; Busch, Christoph; Marcel, Sebastien; Veldhuis, Rainer (Ed.): Handbook of Vascular Biometrics, pp. 395-432, Springer, 2019, ISBN: 978-3-030-27731-4.
In this chapter, we address the problem of biometric identity recognition from the vasculature of the human sclera. Specifically, we focus on the challenging task of multi-view sclera recognition, where the visible part of the sclera vasculature changes from image to image due to varying gaze (or view) directions. We propose a complete solution for this task built around Convolutional Neural Networks (CNNs) and make several contributions that result in state-of-the-art recognition performance, i.e.: (i) we develop a cascaded CNN assembly that is able to robustly segment the sclera vasculature from the input images regardless of gaze direction, and (ii) we present ScleraNET, a CNN model trained in a multi-task manner (combining losses pertaining to identity and view-direction recognition) that allows for the extraction of discriminative vasculature descriptors that can be used for identity inference. To evaluate the proposed contributions, we also introduce a new dataset of ocular images, called the Sclera Blood Vessels, Periocular and Iris (SBVPI) dataset, which represents one of the few publicly available datasets suitable for research in multi-view sclera segmentation and recognition. The datasets come with a rich set of annotations, such as a per-pixel markup of various eye parts (including the sclera vasculature), identity, gaze-direction and gender labels. We conduct rigorous experiments on SBVPI with competing techniques from the literature and show that the combination of the proposed segmentation and descriptor-computation models results in highly competitive recognition performance.
Lozej, Juš; Štepec, Dejan; Štruc, Vitomir; Peer, Peter
In: 7th IAPR/IEEE International Workshop on Biometrics and Forensics (IWBF 2019), 2019.
Despite the rise of deep learning in numerous areas of computer vision and image processing, iris recognition has not benefited considerably from these trends so far. Most of the existing research on deep iris recognition is focused on new models for generating discriminative and robust iris representations and relies on methodologies akin to traditional iris recognition pipelines. Hence, the proposed models do not approach iris recognition in an end-to-end manner, but rather use standard heuristic iris segmentation (and unwrapping) techniques to produce normalized inputs for the deep learning models. However, because deep learning is able to model very complex data distributions and nonlinear data changes, an obvious question arises. How important is the use of traditional segmentation methods in a deep learning setting? To answer this question, we present in this paper an empirical analysis of the impact of iris segmentation on the performance of deep learning models using a simple two stage pipeline consisting of a segmentation and a recognition step. We evaluate how the accuracy of segmentation influences recognition performance but also examine if segmentation is needed at all. We use the CASIA Thousand and SBVPI datasets for the experiments and report several interesting findings.
Rot, Peter; Emeršič, Žiga; Struc, Vitomir; Peer, Peter
Deep multi-class eye segmentation for ocular biometrics Inproceedings
In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–8, IEEE 2018.
Segmentation techniques for ocular biometrics typically focus on finding a single eye region in the input image at the time. Only limited work has been done on multi-class eye segmentation despite a number of obvious advantages. In this paper we address this gap and present a deep multi-class eye segmentation model build around the SegNet architecture. We train the model on a small dataset (of 120 samples) of eye images and observe it to generalize well to unseen images and to ensure highly accurate segmentation results. We evaluate the model on the Multi-Angle Sclera Database (MASD) dataset and describe comprehensive experiments focusing on: i) segmentation performance, ii) error analysis, iii) the sensitivity of the model to changes in view direction, and iv) comparisons with competing single-class techniques. Our results show that the proposed model is viable solution for multi-class eye segmentation suitable for recognition (multi-biometric) pipelines based on ocular characteristics.
Emeršič, Žiga; Gabriel, Luka; Štruc, Vitomir; Peer, Peter
In: IET Biometrics, vol. 7, no. 3, pp. 175–184, 2018.
Object detection and segmentation represents the basis for many tasks in computer and machine vision. In biometric recognition systems the detection of the region-of-interest (ROI) is one of the most crucial steps in the processing pipeline, significantly impacting the performance of the entire recognition system. Existing approaches to ear detection, are commonly susceptible to the presence of severe occlusions, ear accessories or variable illumination conditions and often deteriorate in their performance if applied on ear images captured in unconstrained settings. To address these shortcomings, we present a novel ear detection technique based on convolutional encoder-decoder networks (CEDs). We formulate the problem of ear detection as a two-class segmentation problem and design and train a CED-network architecture to distinguish between image-pixels belonging to the ear and the non-ear class. Unlike competing techniques, our approach does not simply return a bounding box around the detected ear, but provides detailed, pixel-wise information about the location of the ears in the image. Experiments on a dataset gathered from the web (a.k.a. in the wild) show that the proposed technique ensures good detection results in the presence of various covariate factors and significantly outperforms competing methods from the literature.