Emeršič, Žiga; Sušanj, Diego; Meden, Blaž; Peer, Peter; Štruc, Vitomir
In: IEEE Access, pp. 1–17, 2021, ISSN: 2169-3536.
Ear detection represents one of the key components of contemporary ear recognition systems. While significant progress has been made in the area of ear detection over recent years, most of the improvements are direct results of advances in the field of visual object detection. Only a limited number of techniques presented in the literature are domain--specific and designed explicitly with ear detection in mind. In this paper, we aim to address this gap and present a novel detection approach that does not rely only on general ear (object) appearance, but also exploits contextual information, i.e., face--part locations, to ensure accurate and robust ear detection with images captured in a wide variety of imaging conditions. The proposed approach is based on a Context--aware Ear Detection Network (ContexedNet) and poses ear detection as a semantic image segmentation problem. ContexedNet consists of two processing paths: 1) a context--provider that extracts probability maps corresponding to the locations of facial parts from the input image, and 2) a dedicated ear segmentation model that integrates the computed probability maps into a context--aware segmentation-based ear detection procedure. ContexedNet is evaluated in rigorous experiments on the AWE and UBEAR datasets and shown to ensure competitive performance when evaluated against state--of--the--art ear detection models from the literature. Additionally, because the proposed contextualization is model agnostic, it can also be utilized with other ear detection techniques to improve performance.
Stepec, Dejan; Emersic, Ziga; Peer, Peter; Struc, Vitomir
Constellation-Based Deep Ear Recognition Incollection
In: Jiang, R.; Li, CT.; Crookes, D.; Meng, W.; Rosenberger, C. (Ed.): Deep Biometrics: Unsupervised and Semi-Supervised Learning, Springer, 2020, ISBN: 978-3-030-32582-4.
This chapter introduces COM-Ear, a deep constellation model for ear recognition. Different from competing solutions, COM-Ear encodes global as well as local characteristics of ear images and generates descriptive ear representations that ensure competitive recognition performance. The model is designed as dual-path convolutional neural network (CNN), where one path processes the input in a holistic manner, and the second captures local images characteristics from image patches sampled from the input image. A novel pooling operation, called patch-relevant-information pooling, is also proposed and integrated into the COM-Ear model. The pooling operation helps to select features from the input patches that are locally important and to focus the attention of the network to image regions that are descriptive and important for representation purposes. The model is trained in an end-to-end manner using a combined cross-entropy and center loss. Extensive experiments on the recently introduced Extended Annotated Web Ears (AWEx).
Ziga, Emersic; Janez, Krizaj; Vitomir, Struc; Peter, Peer
Deep ear recognition pipeline Incollection
In: Mahmoud, Hassaballah; M., Hosny Khalid (Ed.): Recent advances in computer vision : theories and applications, vol. 804, Springer, 2019, ISBN: 1860-9503.
Ear recognition has seen multiple improvements in recent years and still remains very active today. However, it has been approached from recognition and detection perspective separately. Furthermore, deep-learning-based approaches that are popular in other domains have seen limited use in ear recognition and even more so in ear detection. Moreover, to obtain a usable recognition system a unified pipeline is needed. The input in such system should be plain images of subjects and the output identities based only on ear biometrics. We conduct separate analysis through detection and identification experiments on the challenging dataset and, using the best approaches, present a novel, unified pipeline. The pipeline is based on convolutional neural networks (CNN) and presents, to the best of our knowledge, the first CNN-based ear recognition pipeline. The pipeline incorporates both, the detection of ears on arbitrary images of people, as well as recognition on these segmented ear regions. The experiments show that the presented system is a state-of-the-art system and, thus, a good foundation for future real-word ear recognition systems.
Emeršič, Žiga; Meden, Blaž; Peer, Peter; Štruc, Vitomir
In: Neural Computing and Applications, pp. 1–16, 2018, ISBN: 0941-0643.
Ear recognition technology has long been dominated by (local) descriptor-based techniques due to their formidable recognition performance and robustness to various sources of image variability. While deep-learning-based techniques have started to appear in this field only recently, they have already shown potential for further boosting the performance of ear recognition technology and dethroning descriptor-based methods as the current state of the art. However, while recognition performance is often the key factor when selecting recognition models for biometric technology, it is equally important that the behavior of the models is understood and their sensitivity to different covariates is known and well explored. Other factors, such as the train- and test-time complexity or resource requirements, are also paramount and need to be consider when designing recognition systems. To explore these issues, we present in this paper a comprehensive analysis of several descriptor- and deep-learning-based techniques for ear recognition. Our goal is to discover weak points of contemporary techniques, study the characteristics of the existing technology and identify open problems worth exploring in the future. We conduct our analysis through identification experiments on the challenging Annotated Web Ears (AWE) dataset and report our findings. The results of our analysis show that the presence of accessories and high degrees of head movement significantly impacts the identification performance of all types of recognition models, whereas mild degrees of the listed factors and other covariates such as gender and ethnicity impact the identification performance only to a limited extent. From a test-time-complexity point of view, the results suggest that lightweight deep models can be equally fast as descriptor-based methods given appropriate computing hardware, but require significantly more resources during training, where descriptor-based methods have a clear advantage. As an additional contribution, we also introduce a novel dataset of ear images, called AWE Extended (AWEx), which we collected from the web for the training of the deep models used in our experiments. AWEx contains 4104 images of 346 subjects and represents one of the largest and most challenging (publicly available) datasets of unconstrained ear images at the disposal of the research community.
Emeršič, Žiga; Playa, Nil Oleart; Štruc, Vitomir; Peer, Peter
Towards Accessories-Aware Ear Recognition Inproceedings
In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–8, IEEE 2018.
Automatic ear recognition is gaining popularity within the research community due to numerous desirable properties, such as high recognition performance, the possibility of capturing ear images at a distance and in a covert manner, etc. Despite this popularity and the corresponding research effort that is being directed towards ear recognition technology, open problems still remain. One of the most important issues stopping ear recognition systems from being widely available are ear occlusions and accessories. Ear accessories not only mask biometric features and by this reduce the overall recognition performance, but also introduce new non-biometric features that can be exploited for spoofing purposes. Ignoring ear accessories during recognition can, therefore, present a security threat to ear recognition and also adversely affect performance. Despite the importance of this topic there has been, to the best of our knowledge, no ear recognition studies that would address these problems. In this work we try to close this gap and study the impact of ear accessories on the recognition performance of several state-of-the-art ear recognition techniques. We consider ear accessories as a tool for spoofing attacks and show that CNN-based recognition approaches are more susceptible to spoofing attacks than traditional descriptor-based approaches. Furthermore, we demonstrate that using inpainting techniques or average coloring can mitigate the problems caused by ear accessories and slightly outperforms (standard) black color to mask ear accessories.
Emeršič, Žiga; Štepec, Dejan; Štruc, Vitomir; Peer, Peter; George, Anjith; Ahmad, Adii; Omar, Elshibani; Boult, Terrance E.; Safdaii, Reza; Zhou, Yuxiang; others Stefanos Zafeiriou,; Yaman, Dogucan; Eyoikur, Fevziye I.; Ekenel, Hazim K.
The unconstrained ear recognition challenge Inproceedings
In: 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 715–724, IEEE 2017.
In this paper we present the results o f the Unconstrained Ear Recognition Challenge (UERC), a group benchmarking effort centered around the problem o f person recognition from ear images captured in uncontrolled conditions. The goal o f the challenge was to assess the performance of existing ear recognition techniques on a challenging largescale dataset and identify open problems that need to be addressed in the future. Five groups from three continents participated in the challenge and contributed six ear recognition techniques fo r the evaluation, while multiple baselines were made available for the challenge by the UERC organizers. A comprehensive analysis was conducted with all participating approaches addressing essential research questions pertaining to the sensitivity o f the technology to head rotation, flipping, gallery size, large-scale recognition and others. The top performer o f the UERC was found to ensure robust performance on a smaller part o f the dataset (with 180 subjects) regardless o f image characteristics, but still exhibited a significant performance drop when the entire dataset comprising 3,704 subjects was used for testing.
Emeršič, Žiga; Štepec, Dejan; Štruc, Vitomir; Peer, Peter
In: IEEE International Conference on Automatic Face and Gesture Recognition, Workshop on Biometrics in the Wild 2017, 2017.
Identity recognition from ear images is an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes ear recognition technology an appealing choice for surveillance and security applications as well as related application domains. In contrast to other biometric modalities, where large datasets captured in uncontrolled settings are readily available, datasets of ear images are still limited in size and mostly of laboratory-like quality. As a consequence, ear recognition technology has not benefited yet from advances in deep learning and convolutional neural networks (CNNs) and is still lacking behind other modalities that experienced significant performance gains owing to deep recognition technology. In this paper we address this problem and aim at building a CNNbased ear recognition model. We explore different strategies towards model training with limited amounts of training data and show that by selecting an appropriate model architecture, using aggressive data augmentation and selective learning on existing (pre-trained) models, we are able to learn an effective CNN-based model using a little more than 1300 training images. The result of our work is the first CNN-based approach to ear recognition that is also made publicly available to the research community. With our model we are able to improve on the rank one recognition rate of the previous state-of-the-art by more than 25% on a challenging dataset of ear images captured from the web (a.k.a. in the wild).
Emeršič, Žiga; Štruc, Vitomir; Peer, Peter
Ear recognition: More than a survey Journal Article
In: Neurocomputing, vol. 255, pp. 26–39, 2017.
Automatic identity recognition from ear images represents an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes the technology an appealing choice for surveillance and security applications as well as other application domains. Significant contributions have been made in the field over recent years, but open research problems still remain and hinder a wider (commercial) deployment of the technology. This paper presents an overview of the field of automatic ear recognition (from 2D images) and focuses specifically on the most recent, descriptor-based methods proposed in this area. Open challenges are discussed and potential research directions are outlined with the goal of providing the reader with a point of reference for issues worth examining in the future. In addition to a comprehensive review on ear recognition technology, the paper also introduces a new, fully unconstrained dataset of ear images gathered from the web and a toolbox implementing several state-of-the-art techniques for ear recognition. The dataset and toolbox are meant to address some of the open issues in the field and are made publicly available to the research community.
Ribič, Metod; Emeršič, Žiga; Štruc, Vitomir; Peer, Peter
In: Proceedings of the Electrotechnical and Computer Science Conference (ERK), pp. 131-134, Portorož, Slovenia, 2016.
Ear as a biometric modality presents a viable source for automatic human recognition. In recent years local description methods have been gaining on popularity due to their invariance to illumination and occlusion. However, these methods require that images are well aligned and preprocessed as good as possible. This causes one of the greatest challenges of ear recognition: sensitivity to pose variations. Recently, we presented Annotated Web Ears dataset that opens new challenges in ear recognition. In this paper we test the influence of alignment on recognition performance and prove that even with the alignment the database is still very challenging, even-though the recognition rate is improved due to alignment. We also prove that more sophisticated alignment methods are needed to address the AWE dataset efficiently