Šircelj, Jaka; Oblak, Tim; Grm, Klemen; Petković, Uroš; Jaklič, Aleš; Peer, Peter; Štruc, Vitomir; Solina, Franc
In: 25th Computer Vision Winter Workshop (CVWW 2020), 2020.
In this paper we address the problem of representing 3D visual data with parameterized volumetric shape primitives. Specifically, we present a (two-stage) approach built around convolutional neural networks (CNNs) capable of segmenting complex depth scenes into the simpler geometric structures that can be represented with superquadric models. In the first stage, our approach uses a Mask RCNN model to identify superquadric-like structures in depth scenes and then fits superquadric models to the segmented structures using a specially designed CNN regressor. Using our approach we are able to describe complex structures with a small number of interpretable parameters. We evaluated the proposed approach on synthetic as well as real-world depth data and show that our solution does not only result in competitive performance in comparison to the state-of-the-art, but is able to decompose scenes into a number of superquadric models at a fraction of the time required by competing approaches. We make all data and models used in the paper available from https://lmi.fe.uni-lj.si/en/research/resources/sq-seg.
Oblak, Tim; Grm, Klemen; Jaklič, Aleš; Peer, Peter; Štruc, Vitomir; Solina, Franc
In: 2019 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 45-52, IEEE, 2019.
It has been a longstanding goal in computer vision to describe the 3D physical space in terms of parameterized volumetric models that would allow autonomous machines to understand and interact with their surroundings. Such models are typically motivated by human visual perception and aim to represents all elements of the physical word ranging from individual objects to complex scenes using a small set of parameters. One of the de facto standards to approach this problem are superquadrics - volumetric models that define various 3D shape primitives and can be fitted to actual 3D data (either in the form of point clouds or range images). However, existing solutions to superquadric recovery involve costly iterative fitting procedures, which limit the applicability of such techniques in practice. To alleviate this problem, we explore in this paper the possibility to recover superquadrics from range images without time consuming iterative parameter estimation techniques by using contemporary deep-learning models, more specifically, convolutional neural networks (CNNs). We pose the superquadric recovery problem as a regression task and develop a CNN regressor that is able to estimate the parameters of a superquadric model from a given range image. We train the regressor on a large set of synthetic range images, each containing a single (unrotated) superquadric shape and evaluate the learned model in comparative experiments with the current state-of-the-art. Additionally, we also present a qualitative analysis involving a dataset of real-world objects. The results of our experiments show that the proposed regressor not only outperforms the existing state-of-the-art, but also ensures a 270x faster execution time.
Lozej, Juš; Meden, Blaž; Struc, Vitomir; Peer, Peter
End-to-end iris segmentation using U-Net Inproceedings
In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–6, IEEE 2018.
Iris segmentation is an important research topic that received significant attention from the research community over the years. Traditional iris segmentation techniques have typically been focused on hand-crafted procedures that, nonetheless, achieved remarkable segmentation performance even with images captured in difficult settings. With the success of deep-learning models, researchers are increasingly looking towards convolutional neural networks (CNNs) to further improve on the accuracy of existing iris segmentation techniques and several CNN-based techniques have already been presented recently in the literature. In this paper we also consider deep-learning models for iris segmentation and present an iris segmentation approach based on the popular U-Net architecture. Our model is trainable end-to-end and, hence, avoids the need for hand designing the segmentation procedure. We evaluate the model on the CASIA dataset and report encouraging results in comparison to existing techniques used in this area.
Emeršič, Žiga; Štepec, Dejan; Štruc, Vitomir; Peer, Peter
In: IEEE International Conference on Automatic Face and Gesture Recognition, Workshop on Biometrics in the Wild 2017, 2017.
Identity recognition from ear images is an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes ear recognition technology an appealing choice for surveillance and security applications as well as related application domains. In contrast to other biometric modalities, where large datasets captured in uncontrolled settings are readily available, datasets of ear images are still limited in size and mostly of laboratory-like quality. As a consequence, ear recognition technology has not benefited yet from advances in deep learning and convolutional neural networks (CNNs) and is still lacking behind other modalities that experienced significant performance gains owing to deep recognition technology. In this paper we address this problem and aim at building a CNNbased ear recognition model. We explore different strategies towards model training with limited amounts of training data and show that by selecting an appropriate model architecture, using aggressive data augmentation and selective learning on existing (pre-trained) models, we are able to learn an effective CNN-based model using a little more than 1300 training images. The result of our work is the first CNN-based approach to ear recognition that is also made publicly available to the research community. With our model we are able to improve on the rank one recognition rate of the previous state-of-the-art by more than 25% on a challenging dataset of ear images captured from the web (a.k.a. in the wild).
Grm, Klemen; Štruc, Vitomir; Artiges, Anais; Caron, Matthieu; Ekenel, Hazim K.
In: IET Biometrics, vol. 7, no. 1, pp. 81–89, 2017.
Convolutional neural network (CNN) based approaches are the state of the art in various computer vision tasks including face recognition. Considerable research effort is currently being directed toward further improving CNNs by focusing on model architectures and training techniques. However, studies systematically exploring the strengths and weaknesses of existing deep models for face recognition are still relatively scarce. In this paper, we try to fill this gap and study the effects of different covariates on the verification performance of four recent CNN models using the Labelled Faces in the Wild dataset. Specifically, we investigate the influence of covariates related to image quality and model characteristics, and analyse their impact on the face verification performance of different deep CNN models. Based on comprehensive and rigorous experimentation, we identify the strengths and weaknesses of the deep learning models, and present key areas for potential future research. Our results indicate that high levels of noise, blur, missing pixels, and brightness have a detrimental effect on the verification performance of all models, whereas the impact of contrast changes and compression artefacts is limited. We find that the descriptor-computation strategy and colour information does not have a significant influence on performance.