Emeršič, Žiga; Meden, Blaž; Peer, Peter; Štruc, Vitomir
In: Neural Computing and Applications, pp. 1–16, 2018, ISBN: 0941-0643.
Ear recognition technology has long been dominated by (local) descriptor-based techniques due to their formidable recognition performance and robustness to various sources of image variability. While deep-learning-based techniques have started to appear in this field only recently, they have already shown potential for further boosting the performance of ear recognition technology and dethroning descriptor-based methods as the current state of the art. However, while recognition performance is often the key factor when selecting recognition models for biometric technology, it is equally important that the behavior of the models is understood and their sensitivity to different covariates is known and well explored. Other factors, such as the train- and test-time complexity or resource requirements, are also paramount and need to be consider when designing recognition systems. To explore these issues, we present in this paper a comprehensive analysis of several descriptor- and deep-learning-based techniques for ear recognition. Our goal is to discover weak points of contemporary techniques, study the characteristics of the existing technology and identify open problems worth exploring in the future. We conduct our analysis through identification experiments on the challenging Annotated Web Ears (AWE) dataset and report our findings. The results of our analysis show that the presence of accessories and high degrees of head movement significantly impacts the identification performance of all types of recognition models, whereas mild degrees of the listed factors and other covariates such as gender and ethnicity impact the identification performance only to a limited extent. From a test-time-complexity point of view, the results suggest that lightweight deep models can be equally fast as descriptor-based methods given appropriate computing hardware, but require significantly more resources during training, where descriptor-based methods have a clear advantage. As an additional contribution, we also introduce a novel dataset of ear images, called AWE Extended (AWEx), which we collected from the web for the training of the deep models used in our experiments. AWEx contains 4104 images of 346 subjects and represents one of the largest and most challenging (publicly available) datasets of unconstrained ear images at the disposal of the research community.
Emeršič, Žiga; Gabriel, Luka; Štruc, Vitomir; Peer, Peter
In: IET Biometrics, vol. 7, no. 3, pp. 175–184, 2018.
Object detection and segmentation represents the basis for many tasks in computer and machine vision. In biometric recognition systems the detection of the region-of-interest (ROI) is one of the most crucial steps in the processing pipeline, significantly impacting the performance of the entire recognition system. Existing approaches to ear detection, are commonly susceptible to the presence of severe occlusions, ear accessories or variable illumination conditions and often deteriorate in their performance if applied on ear images captured in unconstrained settings. To address these shortcomings, we present a novel ear detection technique based on convolutional encoder-decoder networks (CEDs). We formulate the problem of ear detection as a two-class segmentation problem and design and train a CED-network architecture to distinguish between image-pixels belonging to the ear and the non-ear class. Unlike competing techniques, our approach does not simply return a bounding box around the detected ear, but provides detailed, pixel-wise information about the location of the ears in the image. Experiments on a dataset gathered from the web (a.k.a. in the wild) show that the proposed technique ensures good detection results in the presence of various covariate factors and significantly outperforms competing methods from the literature.
Emersic, Ziga; Meden, Blaz; Peer, Peter; Struc, Vitornir
In: 2017 international conference and workshop on bioinspired intelligence (IWOBI), pp. 1–9, IEEE 2017.
Dense descriptor-based feature extraction techniques represent a popular choice for implementing biometric ear recognition system and are in general considered to be the current state-of-the-art in this area. In this paper, we study the impact of various factors (i.e., head rotation, presence of occlusions, gender and ethnicity) on the performance of 8 state-of-the-art descriptor-based ear recognition techniques. Our goal is to pinpoint weak points of the existing technology and identify open problems worth exploring in the future. We conduct our covariate analysis through identification experiments on the challenging AWE (Annotated Web Ears) dataset and report our findings. The results of our study show that high degrees of head movement and presence of accessories significantly impact the identification performance, whereas mild degrees of the listed factors and other covariates such as gender and ethnicity impact the identification performance only to a limited extent.
Emeršič, Žiga; Štruc, Vitomir; Peer, Peter
Ear recognition: More than a survey Journal Article
In: Neurocomputing, vol. 255, pp. 26–39, 2017.
Automatic identity recognition from ear images represents an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes the technology an appealing choice for surveillance and security applications as well as other application domains. Significant contributions have been made in the field over recent years, but open research problems still remain and hinder a wider (commercial) deployment of the technology. This paper presents an overview of the field of automatic ear recognition (from 2D images) and focuses specifically on the most recent, descriptor-based methods proposed in this area. Open challenges are discussed and potential research directions are outlined with the goal of providing the reader with a point of reference for issues worth examining in the future. In addition to a comprehensive review on ear recognition technology, the paper also introduces a new, fully unconstrained dataset of ear images gathered from the web and a toolbox implementing several state-of-the-art techniques for ear recognition. The dataset and toolbox are meant to address some of the open issues in the field and are made publicly available to the research community.
Ribič, Metod; Emeršič, Žiga; Štruc, Vitomir; Peer, Peter
In: Proceedings of the Electrotechnical and Computer Science Conference (ERK), pp. 131-134, Portorož, Slovenia, 2016.
Ear as a biometric modality presents a viable source for automatic human recognition. In recent years local description methods have been gaining on popularity due to their invariance to illumination and occlusion. However, these methods require that images are well aligned and preprocessed as good as possible. This causes one of the greatest challenges of ear recognition: sensitivity to pose variations. Recently, we presented Annotated Web Ears dataset that opens new challenges in ear recognition. In this paper we test the influence of alignment on recognition performance and prove that even with the alignment the database is still very challenging, even-though the recognition rate is improved due to alignment. We also prove that more sophisticated alignment methods are needed to address the AWE dataset efficiently