Jug, Julijan; Lampe, Ajda; Štruc, Vitomir; Peer, Peter
Body Segmentation Using Multi-task Learning Inproceedings
In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, 2022, ISBN: 978-1-6654-5818-4.
Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance.
Fele, Benjamin; Lampe, Ajda; Peer, Peter; Štruc, Vitomir
In: IEEE/CVF Winter Applications in Computer Vision (WACV), pp. 1–10, 2022.
Image-based virtual try-on techniques have shown great promise for enhancing the user-experience and improving customer satisfaction on fashion-oriented e-commerce platforms. However, existing techniques are currently still limited in the quality of the try-on results they are able to produce from input images of diverse characteristics. In this work, we propose a Context-Driven Virtual Try-On Network (C-VTON) that addresses these limitations and convincingly transfers selected clothing items to the target subjects even under challenging pose configurations and in the presence of self-occlusions. At the core of the C-VTON pipeline are: (i) a geometric matching procedure that efficiently aligns the target clothing with the pose of the person in the input images, and (ii) a powerful image generator that utilizes various types of contextual information when synthesizing the final try-on result. C-VTON is evaluated in rigorous experiments on the VITON and MPV datasets and in comparison to state-of-the-art techniques from the literature. Experimental results show that the proposed approach is able to produce photo-realistic and visually convincing results and significantly improves on the existing state-of-the-art.
Ivanovska, Marija; Štruc, Vitomir
In: Proceedings of ERK 2021, pp. 1–4, 2021.
Deepfakes or manipulated face images, where a donor's face is swapped with the face of a target person, have gained enormous popularity among the general public recently. With the advancements in artificial intelligence and generative modeling
such images can nowadays be easily generated and used to spread misinformation and harm individuals, businesses or society. As the tools for generating deepfakes are rapidly improving, it is critical for deepfake detection models to be able to recognize advanced, sophisticated data manipulations, including those that have not been seen during training. In this paper, we explore the use of one--class learning models as an alternative to discriminative methods for the detection of deepfakes. We conduct a comparative study with three popular deepfake datasets and investigate the performance of selected (discriminative and one-class) detection models in matched- and cross-dataset experiments. Our results show that disciminative models significantly outperform one-class models when training and testing data come from the same dataset, but degrade considerably when the characteristics of the testing data deviate from the training setting. In such cases, one-class models tend to generalize much better.
Batagelj, Borut; Peer, Peter; Štruc, Vitomir; Dobrišek, Simon
In: Applied sciences, vol. 11, no. 5, pp. 1-24, 2021, ISBN: 2076-3417.
The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). To limit the spread of the disease, mandatory face-mask rules are now becoming common in public settings around the world. Additionally, many public service providers require customers to wear face-masks in accordance with predefined rules (e.g., covering both mouth and nose) when using public services. These developments inspired research into automatic (computer-vision-based) techniques for face-mask detection that can help monitor public behavior and contribute towards constraining the COVID-19 pandemic. Although existing research in this area resulted in efficient techniques for face-mask detection, these usually operate under the assumption that modern face detectors provide perfect detection performance (even for masked faces) and that the main goal of the techniques is to detect the presence of face-masks only. In this study, we revisit these common assumptions and explore the following research questions: (i) How well do existing face detectors perform with masked-face images? (ii) Is it possible to detect a proper (regulation-compliant) placement of facial masks? and (iii) How useful are existing face-mask detection techniques for monitoring applications during the COVID-19 pandemic? To answer these and related questions we conduct a comprehensive experimental evaluation of several recent face detectors for their performance with masked-face images. Furthermore, we investigate the usefulness of multiple off-the-shelf deep-learning models for recognizing correct face-mask placement. Finally, we design a complete pipeline for recognizing whether face-masks are worn correctly or not and compare the performance of the pipeline with standard face-mask detection models from the literature. To facilitate the study, we compile a large dataset of facial images from the publicly available MAFA and Wider Face datasets and annotate it with compliant and non-compliant labels. The annotation dataset, called Face-Mask-Label Dataset (FMLD), is made publicly available to the research community.
Oblak, Tim; Šircelj, Jaka; Struc, Vitomir; Peer, Peter; Solina, Franc; Jaklic, Aleš
In: IEEE Access, pp. 1-16, 2021, ISSN: 2169-3536.
Reconstruction of 3D space from visual data has always been a significant challenge in
the field of computer vision. A popular approach to address this problem can be found in the form of
bottom-up reconstruction techniques which try to model complex 3D scenes through a constellation of
volumetric primitives. Such techniques are inspired by the current understanding of the human visual
system and are, therefore, strongly related to the way humans process visual information, as suggested
by recent visual neuroscience literature. While advances have been made in recent years in the area of
3D reconstruction, the problem remains challenging due to the many possible ways of representing 3D
data, the ambiguity of determining the shape and general position in 3D space and the difficulty to train
efficient models for the prediction of volumetric primitives. In this paper, we address these challenges and
present a novel solution for recovering volumetric primitives from depth images. Specifically, we focus on
the recovery of superquadrics, a special type of parametric models able to describe a wide array of 3D
shapes using only a few parameters. We present a new learning objective that relies on the superquadric
(inside-outside) function and develop two learning strategies for training convolutional neural networks
(CNN) capable of predicting superquadric parameters. The first uses explicit supervision and penalizes the
difference between the predicted and reference superquadric parameters. The second strategy uses implicit
supervision and penalizes differences between the input depth images and depth images rendered from
the predicted parameters. CNN predictors for superquadric parameters are trained with both strategies and
evaluated on a large dataset of synthetic and real-world depth images. Experimental results show that both
strategies compare favourably to the existing state-of-the-art and result in high quality 3D reconstructions
of the modelled scenes at a much shorter processing time.
Grm, Klemen; Scheirer, Walter J.; Štruc, Vitomir
In: IEEE Transactions on Image Processing, 2020.
In this paper we address the problem of hallucinating high-resolution facial images from low-resolution inputs at high magnification factors. We approach this task with convolutional neural networks (CNNs) and propose a novel (deep) face hallucination model that incorporates identity priors into the learning procedure. The model consists of two main parts: i) a cascaded super-resolution network that upscales the lowresolution facial images, and ii) an ensemble of face recognition models that act as identity priors for the super-resolution network during training. Different from most competing super-resolution techniques that rely on a single model for upscaling (even with large magnification factors), our network uses a cascade of multiple SR models that progressively upscale the low-resolution images using steps of 2×. This characteristic allows us to apply supervision signals (target appearances) at different resolutions and incorporate identity constraints at multiple-scales. The proposed C-SRIP model (Cascaded Super Resolution with Identity Priors) is able to upscale (tiny) low-resolution images captured in unconstrained conditions and produce visually convincing results for diverse low-resolution inputs. We rigorously evaluate the proposed model on the Labeled Faces in the Wild (LFW), Helen and CelebA datasets and report superior performance compared to the existing state-of-the-art.
Meden, Blaž; Malli, Refik Can; Fabijan, Sebastjan; Ekenel, Hazim Kemal; Štruc, Vitomir; Peer, Peter
Face deidentification with generative deep neural networks Journal Article
In: IET Signal Processing, vol. 11, no. 9, pp. 1046–1054, 2017.
Face deidentification is an active topic amongst privacy and security researchers. Early deidentification methods relying on image blurring or pixelisation have been replaced in recent years with techniques based on formal anonymity models that provide privacy guaranties and retain certain characteristics of the data even after deidentification. The latter aspect is important, as it allows the deidentified data to be used in applications for which identity information is irrelevant. In this work, the authors present a novel face deidentification pipeline, which ensures anonymity by synthesising artificial surrogate faces using generative neural networks (GNNs). The generated faces are used to deidentify subjects in images or videos, while preserving non-identity-related aspects of the data and consequently enabling data utilisation. Since generative networks are highly adaptive and can utilise diverse parameters (pertaining to the appearance of the generated output in terms of facial expressions, gender, race etc.), they represent a natural choice for the problem of face deidentification. To demonstrate the feasibility of the authors’ approach, they perform experiments using automated recognition tools and human annotators. Their results show that the recognition performance on deidentified images is close to chance, suggesting that the deidentification process based on GNNs is effective.
Marčetić, Darijan; Ribarić, Slobodan; Štruc, Vitomir; Pavešić, Nikola
In: 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1288–1293, Mipro IEEE, 2014.
An experimental tattoo de-identification system for privacy protection in still images is described in the paper. The system consists of the following modules: skin detection, region of interest detection, feature extraction, tattoo database, matching, tattoo detection, skin swapping, and quality evaluation. Two methods for tattoo localization are presented. The first is a simple ad-hoc method based only on skin colour. The second is based on skin colour, texture and SIFT features. The appearance of each tattoo area is de-identified in such a way that its skin colour and skin texture are similar to the surrounding skin area. Experimental results for still images in which tattoo location, distance, size, illumination, and motion blur have large variability are presented. The system is subjectively evaluated based on the results of tattoo localization, the level of privacy protection and the naturalness of the de-identified still images. The level of privacy protection is estimated based on the quality of the removal of the tattoo appearance and the concealment of its location.
Mihelič, Nikola Pavešić Vitomir Štruc France
Color spaces for face recognition Inproceedings
In: Proceedings of the International Electrotechnical and Computer Science Conference (ERK'07), pp. 171-174, Portorož, Slovenia, 2007.
The paper investigates the impact that the face-image color space has on the verification performance of two popular face recognition procedures, i.e., the Fisherface approach and the Gabor-Fisher classifier - GFC. Experimental results on the XM2VTS database show that the Fisherface technique performs best when features are extracted from the Cr component of the YCbCr color space, while the performance of the Gabor-Fisher classifier is optimized when grey-scale intensity face-images are used for feature extraction. Based on these findings, a novel face recognition framework that combines the Fisherface and the GFC method is introduced in this paper and its feasibility demonstrated in a comparative study where, in addition to the proposed method, six widely used feature extraction techniques were tested for their face verification performance.