Publications – Laboratory for Machine Intelligence

2021

Oblak, Tim; Šircelj, Jaka; Struc, Vitomir; Peer, Peter; Solina, Franc; Jaklic, Aleš

Learning to predict superquadric parameters from depth images with explicit and implicit supervision Journal Article

In: IEEE Access, pp. 1-16, 2021, ISSN: 2169-3536.

Abstract | Links | BibTeX | Tags: 3d, computer vision, depth images, differential renderer, recovery, superquadric

@article{Oblak2021,

title = {Learning to predict superquadric parameters from depth images with explicit and implicit supervision},

author = {Tim Oblak and Jaka Šircelj and Vitomir Struc and Peter Peer and Franc Solina and Aleš Jaklic

},

url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9274424},

doi = {10.1109/ACCESS.2020.3041584},

issn = {2169-3536},

year  = {2021},

date = {2021-01-01},

journal = {IEEE Access},

pages = {1-16},

abstract = {Reconstruction of 3D space from visual data has always been a significant challenge in

the field of computer vision. A popular approach to address this problem can be found in the form of

bottom-up reconstruction techniques which try to model complex 3D scenes through a constellation of

volumetric primitives. Such techniques are inspired by the current understanding of the human visual

system and are, therefore, strongly related to the way humans process visual information, as suggested

by recent visual neuroscience literature. While advances have been made in recent years in the area of

3D reconstruction, the problem remains challenging due to the many possible ways of representing 3D

data, the ambiguity of determining the shape and general position in 3D space and the difficulty to train

efficient models for the prediction of volumetric primitives. In this paper, we address these challenges and

present a novel solution for recovering volumetric primitives from depth images. Specifically, we focus on

the recovery of superquadrics, a special type of parametric models able to describe a wide array of 3D

shapes using only a few parameters. We present a new learning objective that relies on the superquadric

(inside-outside) function and develop two learning strategies for training convolutional neural networks

(CNN) capable of predicting superquadric parameters. The first uses explicit supervision and penalizes the

difference between the predicted and reference superquadric parameters. The second strategy uses implicit

supervision and penalizes differences between the input depth images and depth images rendered from

the predicted parameters. CNN predictors for superquadric parameters are trained with both strategies and

evaluated on a large dataset of synthetic and real-world depth images. Experimental results show that both

strategies compare favourably to the existing state-of-the-art and result in high quality 3D reconstructions

of the modelled scenes at a much shorter processing time.},

keywords = {3d, computer vision, depth images, differential renderer, recovery, superquadric},

pubstate = {published},

tppubtype = {article}

}

Reconstruction of 3D space from visual data has always been a significant challenge in
the field of computer vision. A popular approach to address this problem can be found in the form of
bottom-up reconstruction techniques which try to model complex 3D scenes through a constellation of
volumetric primitives. Such techniques are inspired by the current understanding of the human visual
system and are, therefore, strongly related to the way humans process visual information, as suggested
by recent visual neuroscience literature. While advances have been made in recent years in the area of
3D reconstruction, the problem remains challenging due to the many possible ways of representing 3D
data, the ambiguity of determining the shape and general position in 3D space and the difficulty to train
efficient models for the prediction of volumetric primitives. In this paper, we address these challenges and
present a novel solution for recovering volumetric primitives from depth images. Specifically, we focus on
the recovery of superquadrics, a special type of parametric models able to describe a wide array of 3D
shapes using only a few parameters. We present a new learning objective that relies on the superquadric
(inside-outside) function and develop two learning strategies for training convolutional neural networks
(CNN) capable of predicting superquadric parameters. The first uses explicit supervision and penalizes the
difference between the predicted and reference superquadric parameters. The second strategy uses implicit
supervision and penalizes differences between the input depth images and depth images rendered from
the predicted parameters. CNN predictors for superquadric parameters are trained with both strategies and
evaluated on a large dataset of synthetic and real-world depth images. Experimental results show that both
strategies compare favourably to the existing state-of-the-art and result in high quality 3D reconstructions
of the modelled scenes at a much shorter processing time.

2019

Krizaj, Janez; Peer, Peter; Struc, Vitomir; Dobrisek, Simon

Simultaneous multi-decent regression and feature learning for landmarking in depth image Journal Article

In: Neural Computing and Applications, 2019, ISBN: 0941-0643.

Abstract | Links | BibTeX | Tags: 3d, biometrics, depth data, face alignment, face analysis, landmarking

@article{Krizaj3Docalization,

title = {Simultaneous multi-decent regression and feature learning for landmarking in depth image},

author = {Janez Krizaj and Peter Peer and Vitomir Struc and Simon Dobrisek},

url = {https://link.springer.com/content/pdf/10.1007%2Fs00521-019-04529-7.pdf},

doi = {https://doi.org/10.1007/s00521-019-04529-7},

isbn = {0941-0643},

year  = {2019},

date = {2019-10-01},

journal = {Neural Computing and Applications},

abstract = {Face alignment (or facial landmarking) is an important task in many face-related applications, ranging from registration, tracking, and animation to higher-level classification problems such as face, expression, or attribute recognition. While several solutions have been presented in the literature for this task so far, reliably locating salient facial features across a wide range of posses still remains challenging. To address this issue, we propose in this paper a novel method for automatic facial landmark localization in 3D face data designed specifically to address appearance variability caused by significant pose variations. Our method builds on recent cascaded regression-based methods to facial landmarking and uses a gating mechanism to incorporate multiple linear cascaded regression models each trained for a limited range of poses into a single powerful landmarking model capable of processing arbitrary-posed input data. We develop two distinct approaches around the proposed gating mechanism: (1) the first uses a gated multiple ridge descent mechanism in conjunction with established (hand-crafted) histogram of gradients features for face alignment and achieves state-of-the-art landmarking performance across a wide range of facial poses and (2) the second simultaneously learns multiple-descent directions as well as binary features that are optimal for the alignment tasks and in addition to competitive landmarking results also ensures extremely rapid processing. We evaluate both approaches in rigorous experiments on several popular datasets of 3D face images, i.e., the FRGCv2 and Bosphorus 3D face datasets and image collections F and G from the University of Notre Dame. The results of our evaluation show that both approaches compare favorably to the state-of-the-art, while exhibiting considerable robustness to pose variations.},

keywords = {3d, biometrics, depth data, face alignment, face analysis, landmarking},

pubstate = {published},

tppubtype = {article}

}