Emeršič, Žiga; Meden, Blaž; Peer, Peter; Štruc, Vitomir
In: Neural Computing and Applications, pp. 1–16, 2018, ISBN: 0941-0643.
Ear recognition technology has long been dominated by (local) descriptor-based techniques due to their formidable recognition performance and robustness to various sources of image variability. While deep-learning-based techniques have started to appear in this field only recently, they have already shown potential for further boosting the performance of ear recognition technology and dethroning descriptor-based methods as the current state of the art. However, while recognition performance is often the key factor when selecting recognition models for biometric technology, it is equally important that the behavior of the models is understood and their sensitivity to different covariates is known and well explored. Other factors, such as the train- and test-time complexity or resource requirements, are also paramount and need to be consider when designing recognition systems. To explore these issues, we present in this paper a comprehensive analysis of several descriptor- and deep-learning-based techniques for ear recognition. Our goal is to discover weak points of contemporary techniques, study the characteristics of the existing technology and identify open problems worth exploring in the future. We conduct our analysis through identification experiments on the challenging Annotated Web Ears (AWE) dataset and report our findings. The results of our analysis show that the presence of accessories and high degrees of head movement significantly impacts the identification performance of all types of recognition models, whereas mild degrees of the listed factors and other covariates such as gender and ethnicity impact the identification performance only to a limited extent. From a test-time-complexity point of view, the results suggest that lightweight deep models can be equally fast as descriptor-based methods given appropriate computing hardware, but require significantly more resources during training, where descriptor-based methods have a clear advantage. As an additional contribution, we also introduce a novel dataset of ear images, called AWE Extended (AWEx), which we collected from the web for the training of the deep models used in our experiments. AWEx contains 4104 images of 346 subjects and represents one of the largest and most challenging (publicly available) datasets of unconstrained ear images at the disposal of the research community.