Publications – Laboratory for Machine Intelligence

Lampe, Ajda; Stopar, Julija; Jain, Deepak Kumar; Omachi, Shinichiro; Peer, Peter; Struc, Vitomir

DiCTI: Diffusion-based Clothing Designer via Text-guided Input Proceedings Article

In: Proceedings of the18th International Conference on Automatic Face and Gesture Recognition (FG 2024), pp. 1-9, 2024.

Abstract | Links | BibTeX | Tags: clothing design, deepbeauty, denoising diffusion probabilistic models, diffusion, diffusion models, fashion, virtual try-on

@inproceedings{Ajda_Dicti,

title = {DiCTI: Diffusion-based Clothing Designer via Text-guided Input},

author = {Ajda Lampe and Julija Stopar and Deepak Kumar Jain and Shinichiro Omachi and Peter Peer and Vitomir Struc},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2024/06/Dicti_FG2024_compressed.pdf},

year  = {2024},

date = {2024-05-27},

booktitle = {Proceedings of the18th International Conference on Automatic Face and Gesture Recognition (FG 2024)},

pages = {1-9},

abstract = {Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on applications, there has been relatively less focus on facilitating fast prototyping for designers and customers seeking to order new designs. To address this gap, we introduce DiCTI (Diffusion-based Clothing Designer via Text-guided Input), a straightforward yet highly effective approach that allows designers to quickly visualize fashion-related ideas using text inputs only. 

Given an image of a person and a description of the desired garments as input, DiCTI automatically generates multiple high-resolution, photorealistic images that capture the expressed semantics.  

By leveraging a powerful diffusion-based inpainting model conditioned on text inputs, DiCTI is able to synthesize convincing, high-quality images with varied clothing designs that viably follow the provided text descriptions, while being able to process very diverse and challenging inputs, captured in completely unconstrained settings. We evaluate DiCTI in comprehensive experiments on two different datasets (VITON-HD and Fashionpedia) and in comparison to the state-of-the-art (SoTa). The results of our experiments show that DiCTI convincingly outperforms the SoTA competitor in generating higher quality images with more elaborate garments and superior text prompt adherence, both according to standard quantitative evaluation measures and human ratings, generated as part of a user study. The source code of DiCTI will be made publicly available.},

keywords = {clothing design, deepbeauty, denoising diffusion probabilistic models, diffusion, diffusion models, fashion, virtual try-on},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Plesh, Richard; Peer, Peter; Štruc, Vitomir

GlassesGAN: Eyewear Personalization using Synthetic Appearance Discovery and Targeted Subspace Modeling Proceedings Article

In: Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR) , 2023.

Abstract | Links | BibTeX | Tags: eyewear, eyewear personalization, face editing, GAN inversion, latent space editing, StyleGAN2, synthetic appearance discovery, targeted subspace modeling, virtual try-on

@inproceedings{PleshCVPR2023,

title = {GlassesGAN: Eyewear Personalization using Synthetic Appearance Discovery and Targeted Subspace Modeling},

author = {Richard Plesh and Peter Peer and Vitomir Štruc},

url = {https://arxiv.org/pdf/2210.14145.pdf

https://openaccess.thecvf.com/content/CVPR2023/html/Plesh_GlassesGAN_Eyewear_Personalization_Using_Synthetic_Appearance_Discovery_and_Targeted_Subspace_CVPR_2023_paper.html},

year  = {2023},

date = {2023-06-18},

urldate = {2023-06-18},

booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR) },

abstract = {We present GlassesGAN, a novel image editing framework for custom design of glasses, that sets a new standard in terms of image quality, edit realism, and continuous multi-style edit capability. To facilitate the editing process with GlassesGAN, we propose a Targeted Subspace Modelling (TSM) procedure that, based on a novel mechanism for (synthetic) appearance discovery in the latent space of a pre-trained GAN generator, constructs an eyeglasses-specific (latent) subspace that the editing framework can utilize. Additionally, we also introduce an appearance-constrained subspace initialization (SI) technique that centers the latent representation of the given input image in the well-defined part of the constructed subspace to improve the reliability of the learned edits. We test GlassesGAN on two (diverse) high-resolution datasets (CelebA-HQ and SiblingsDB-HQf) and compare it to three state-of-the-art competitors, i.e., InterfaceGAN, GANSpace, and MaskGAN. The reported results show that GlassesGAN convincingly outperforms all competing techniques, while offering additional functionality (e.g.,  fine-grained multi-style editing) not available with any of the competitors.  The source code will be made freely available.},

keywords = {eyewear, eyewear personalization, face editing, GAN inversion, latent space editing, StyleGAN2, synthetic appearance discovery, targeted subspace modeling, virtual try-on},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Jug, Julijan; Lampe, Ajda; Štruc, Vitomir; Peer, Peter

Body Segmentation Using Multi-task Learning Proceedings Article

In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, 2022, ISBN: 978-1-6654-5818-4.

Abstract | Links | BibTeX | Tags: body segmentation, cn, CNN, computer vision, deep beauty, deep learning, multi-task learning, segmentation, virtual try-on

@inproceedings{JulijanJugBody,

title = {Body Segmentation Using Multi-task Learning},

author = {Julijan Jug and Ajda Lampe and Vitomir Štruc and Peter Peer},

url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2022/03/ICAIIC_paper.pdf},

doi = {10.1109/ICAIIC54071.2022.9722662},

isbn = {978-1-6654-5818-4},

year  = {2022},

date = {2022-01-20},

urldate = {2022-01-20},

booktitle = {International Conference on Artificial Intelligence in Information and Communication (ICAIIC)},

publisher = {IEEE},

abstract = {Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks.  Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance.  Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a  better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the  model is analysed through rigorous experiments on the  LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance. },

keywords = {body segmentation, cn, CNN, computer vision, deep beauty, deep learning, multi-task learning, segmentation, virtual try-on},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Fele, Benjamin; Lampe, Ajda; Peer, Peter; Štruc, Vitomir

C-VTON: Context-Driven Image-Based Virtual Try-On Network Proceedings Article

In: IEEE/CVF Winter Applications in Computer Vision (WACV), pp. 1–10, 2022.

Abstract | Links | BibTeX | Tags: computer vision, deepbeauty, fashion, generative models, image editing, try-on, virtual try-on