Publications

2023
Ivanovska, Marija; Štruc, Vitomir; Perš, Janez TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models Proceedings Article In: 18th International Conference on Machine Vision and Applications (MVA 2023), pp. 1-6, 2023. Abstract \| Links \| BibTeX \| Tags: agriculture, dataset, deep learning, diffusion, plan segmentation, plant monitoring, robotics, segmentation, tomato dataset @inproceedings{MarijaTomato2023, title = {TomatoDIFF: On–plant Tomato Segmentation with Denoising Diffusion Models}, author = {Marija Ivanovska and Vitomir Štruc and Janez Perš }, url = {https://arxiv.org/pdf/2307.01064.pdf https://ieeexplore.ieee.org/document/10215774}, doi = {10.23919/MVA57639.2023.10215774}, year = {2023}, date = {2023-07-23}, urldate = {2023-07-23}, booktitle = {18th International Conference on Machine Vision and Applications (MVA 2023)}, pages = {1-6}, abstract = {Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github. com/MIvanovska/TomatoDIFF}, keywords = {agriculture, dataset, deep learning, diffusion, plan segmentation, plant monitoring, robotics, segmentation, tomato dataset}, pubstate = {published}, tppubtype = {inproceedings} } Close Artificial intelligence applications enable farmers to optimize crop growth and production while reducing costs and environmental impact. Computer vision-based algorithms in particular, are commonly used for fruit segmentation, enabling in-depth analysis of the harvest quality and accurate yield estimation. In this paper, we propose TomatoDIFF, a novel diffusion-based model for semantic segmentation of on-plant tomatoes. When evaluated against other competitive methods, our model demonstrates state-of-the-art (SOTA) performance, even in challenging environments with highly occluded fruits. Additionally, we introduce Tomatopia, a new, large and challenging dataset of greenhouse tomatoes. The dataset comprises high-resolution RGB-D images and pixel-level annotations of the fruits. The source code of TomatoDIFF and Tomatopia are available at https://github. com/MIvanovska/TomatoDIFF Close https://arxiv.org/pdf/2307.01064.pdf https://ieeexplore.ieee.org/document/10215774 doi:10.23919/MVA57639.2023.10215774 Close
2020
Vitek, Matej; Rot, Peter; Struc, Vitomir; Peer, Peter A comprehensive investigation into sclera biometrics: a novel dataset and performance study Journal Article In: Neural Computing and Applications, pp. 1-15, 2020. Abstract \| Links \| BibTeX \| Tags: biometrics, CNN, dataset, multi-view, ocular, performance study, recognition, sclera, segmentation, visible light @article{vitek2020comprehensive, title = {A comprehensive investigation into sclera biometrics: a novel dataset and performance study}, author = {Matej Vitek and Peter Rot and Vitomir Struc and Peter Peer}, url = {https://link.springer.com/epdf/10.1007/s00521-020-04782-1}, doi = {https://doi.org/10.1007/s00521-020-04782-1}, year = {2020}, date = {2020-01-01}, journal = {Neural Computing and Applications}, pages = {1-15}, abstract = {The area of ocular biometrics is among the most popular branches of biometric recognition technology. This area has long been dominated by iris recognition research, while other ocular modalities such as the periocular region or the vasculature of the sclera have received significantly less attention in the literature. Consequently, ocular modalities beyond the iris are not well studied and their characteristics are today still not as well understood. While recent needs for more secure authentication schemes have considerably increased the interest in competing ocular modalities, progress in these areas is still held back by the lack of publicly available datasets that would allow for more targeted research into specific ocular characteristics next to the iris. In this paper, we aim to bridge this gap for the case of sclera biometrics and introduce a novel dataset designed for research into ocular biometrics and most importantly for research into the vasculature of the sclera. Our dataset, called Sclera Blood Vessels, Periocular and Iris (SBVPI), is, to the best of our knowledge, the first publicly available dataset designed specifically with research in sclera biometrics in mind. The dataset contains high-quality RGB ocular images, captured in the visible spectrum, belonging to 55 subjects. Unlike competing datasets, it comes with manual markups of various eye regions, such as the iris, pupil, canthus or eyelashes and a detailed pixel-wise annotation of the complete sclera vasculature for a subset of the images. Additionally, the datasets ship with gender and age labels. The unique characteristics of the dataset allow us to study aspects of sclera biometrics technology that have not been studied before in the literature (e.g. vasculature segmentation techniques) as well as issues that are of key importance for practical recognition systems. Thus, next to the SBVPI dataset we also present in this paper a comprehensive investigation into sclera biometrics and the main covariates that affect the performance of sclera segmentation and recognition techniques, such as gender, age, gaze direction or image resolution. Our experiments not only demonstrate the usefulness of the newly introduced dataset, but also contribute to a better understanding of sclera biometrics in general.}, keywords = {biometrics, CNN, dataset, multi-view, ocular, performance study, recognition, sclera, segmentation, visible light}, pubstate = {published}, tppubtype = {article} } Close The area of ocular biometrics is among the most popular branches of biometric recognition technology. This area has long been dominated by iris recognition research, while other ocular modalities such as the periocular region or the vasculature of the sclera have received significantly less attention in the literature. Consequently, ocular modalities beyond the iris are not well studied and their characteristics are today still not as well understood. While recent needs for more secure authentication schemes have considerably increased the interest in competing ocular modalities, progress in these areas is still held back by the lack of publicly available datasets that would allow for more targeted research into specific ocular characteristics next to the iris. In this paper, we aim to bridge this gap for the case of sclera biometrics and introduce a novel dataset designed for research into ocular biometrics and most importantly for research into the vasculature of the sclera. Our dataset, called Sclera Blood Vessels, Periocular and Iris (SBVPI), is, to the best of our knowledge, the first publicly available dataset designed specifically with research in sclera biometrics in mind. The dataset contains high-quality RGB ocular images, captured in the visible spectrum, belonging to 55 subjects. Unlike competing datasets, it comes with manual markups of various eye regions, such as the iris, pupil, canthus or eyelashes and a detailed pixel-wise annotation of the complete sclera vasculature for a subset of the images. Additionally, the datasets ship with gender and age labels. The unique characteristics of the dataset allow us to study aspects of sclera biometrics technology that have not been studied before in the literature (e.g. vasculature segmentation techniques) as well as issues that are of key importance for practical recognition systems. Thus, next to the SBVPI dataset we also present in this paper a comprehensive investigation into sclera biometrics and the main covariates that affect the performance of sclera segmentation and recognition techniques, such as gender, age, gaze direction or image resolution. Our experiments not only demonstrate the usefulness of the newly introduced dataset, but also contribute to a better understanding of sclera biometrics in general. Close https://link.springer.com/epdf/10.1007/s00521-020-04782-1 doi:https://doi.org/10.1007/s00521-020-04782-1 Close
2017
Emeršič, Žiga; Štruc, Vitomir; Peer, Peter Ear recognition: More than a survey Journal Article In: Neurocomputing, vol. 255, pp. 26–39, 2017. Abstract \| Links \| BibTeX \| Tags: AWE, biometrics, dataset, ear, ear recognition, performance evalution, survey, toolbox @article{emervsivc2017ear, title = {Ear recognition: More than a survey}, author = {Žiga Emeršič and Vitomir Štruc and Peter Peer}, url = {https://arxiv.org/pdf/1611.06203.pdf}, year = {2017}, date = {2017-01-01}, journal = {Neurocomputing}, volume = {255}, pages = {26--39}, publisher = {Elsevier}, abstract = {Automatic identity recognition from ear images represents an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes the technology an appealing choice for surveillance and security applications as well as other application domains. Significant contributions have been made in the field over recent years, but open research problems still remain and hinder a wider (commercial) deployment of the technology. This paper presents an overview of the field of automatic ear recognition (from 2D images) and focuses specifically on the most recent, descriptor-based methods proposed in this area. Open challenges are discussed and potential research directions are outlined with the goal of providing the reader with a point of reference for issues worth examining in the future. In addition to a comprehensive review on ear recognition technology, the paper also introduces a new, fully unconstrained dataset of ear images gathered from the web and a toolbox implementing several state-of-the-art techniques for ear recognition. The dataset and toolbox are meant to address some of the open issues in the field and are made publicly available to the research community.}, keywords = {AWE, biometrics, dataset, ear, ear recognition, performance evalution, survey, toolbox}, pubstate = {published}, tppubtype = {article} } Close Automatic identity recognition from ear images represents an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes the technology an appealing choice for surveillance and security applications as well as other application domains. Significant contributions have been made in the field over recent years, but open research problems still remain and hinder a wider (commercial) deployment of the technology. This paper presents an overview of the field of automatic ear recognition (from 2D images) and focuses specifically on the most recent, descriptor-based methods proposed in this area. Open challenges are discussed and potential research directions are outlined with the goal of providing the reader with a point of reference for issues worth examining in the future. In addition to a comprehensive review on ear recognition technology, the paper also introduces a new, fully unconstrained dataset of ear images gathered from the web and a toolbox implementing several state-of-the-art techniques for ear recognition. The dataset and toolbox are meant to address some of the open issues in the field and are made publicly available to the research community. Close https://arxiv.org/pdf/1611.06203.pdf Close
2015
Justin, Tadej; Štruc, Vitomir; Žibert, Janez; Mihelič, France Development and Evaluation of the Emotional Slovenian Speech Database-EmoLUKS Proceedings Article In: Proceedings of the International Conference on Text, Speech, and Dialogue (TSD), pp. 351–359, Springer 2015. Abstract \| Links \| BibTeX \| Tags: annotated data, dataset, dataset of emotional speech, EmoLUKS, emotional speech synthesis, speech synthesis, speech technologies, transcriptions @inproceedings{justin2015development, title = {Development and Evaluation of the Emotional Slovenian Speech Database-EmoLUKS}, author = {Tadej Justin and Vitomir Štruc and Janez Žibert and France Mihelič}, url = {https://lmi.fe.uni-lj.si/en/developmentandevaluationoftheemotionalslovenianspeechdatabase-emoluks/}, year = {2015}, date = {2015-01-01}, urldate = {2015-01-01}, booktitle = {Proceedings of the International Conference on Text, Speech, and Dialogue (TSD)}, pages = {351--359}, organization = {Springer}, abstract = {This paper describes a speech database built from 17 Slovenian radio dramas. The dramas were obtained from the national radio-and-television station (RTV Slovenia) and were given at the universities disposal with an academic license for processing and annotating the audio material. The utterances of one male and one female speaker were transcribed, segmented and then annotated with emotional states of the speakers. The annotation of the emotional states was conducted in two stages with our own web-based application for crowd sourcing. The final (emotional) speech database consists of 1385 recordings of one male (975 recordings) and one female (410 recordings) speaker and contains labeled emotional speech with a total duration of around 1 hour and 15 minutes. The paper presents the two-stage annotation process used to label the data and demonstrates the usefulness of the employed annotation methodology. Baseline emotion recognition experiments are also presented. The reported results are presented with the un-weighted as well as weighted average recalls and precisions for 2-class and 7-class recognition experiments.}, keywords = {annotated data, dataset, dataset of emotional speech, EmoLUKS, emotional speech synthesis, speech synthesis, speech technologies, transcriptions}, pubstate = {published}, tppubtype = {inproceedings} } Close This paper describes a speech database built from 17 Slovenian radio dramas. The dramas were obtained from the national radio-and-television station (RTV Slovenia) and were given at the universities disposal with an academic license for processing and annotating the audio material. The utterances of one male and one female speaker were transcribed, segmented and then annotated with emotional states of the speakers. The annotation of the emotional states was conducted in two stages with our own web-based application for crowd sourcing. The final (emotional) speech database consists of 1385 recordings of one male (975 recordings) and one female (410 recordings) speaker and contains labeled emotional speech with a total duration of around 1 hour and 15 minutes. The paper presents the two-stage annotation process used to label the data and demonstrates the usefulness of the employed annotation methodology. Baseline emotion recognition experiments are also presented. The reported results are presented with the un-weighted as well as weighted average recalls and precisions for 2-class and 7-class recognition experiments. Close https://lmi.fe.uni-lj.si/en/developmentandevaluationoftheemotionalslovenianspeec[...] Close
2009
Gajšek, Rok; Štruc, Vitomir; Mihelič, France; Podlesek, Anja; Komidar, Luka; Sočan, Gregor; Bajec, Boštjan Multi-modal emotional database: AvID Journal Article In: Informatica (Ljubljana), vol. 33, no. 1, pp. 101-106, 2009. Abstract \| Links \| BibTeX \| Tags: avid, database, dataset, emotion recognition, facial expression recognition, speech, speech technologies, spontaneous emotions @article{Inform-Gajsek_2009, title = {Multi-modal emotional database: AvID}, author = {Rok Gajšek and Vitomir Štruc and France Mihelič and Anja Podlesek and Luka Komidar and Gregor Sočan and Boštjan Bajec}, url = {https://lmi.fe.uni-lj.si/en/multi-modalemotionaldatabaseavid/}, year = {2009}, date = {2009-01-01}, urldate = {2009-01-01}, journal = {Informatica (Ljubljana)}, volume = {33}, number = {1}, pages = {101-106}, abstract = {This paper presents our work on recording a multi-modal database containing emotional audio and video recordings. In designing the recording strategies a special attention was payed to gather data involving spontaneous emotions and therefore obtain a more realistic training and testing conditions for experiments. With specially planned scenarios including playing computer games and conducting an adaptive intelligence test different levels of arousal were induced. This will enable us to both detect different emotional states as well as experiment in speaker identification/verification of people involved in communications. So far the multi-modal database has been recorded and basic evaluation of the data was processed.}, keywords = {avid, database, dataset, emotion recognition, facial expression recognition, speech, speech technologies, spontaneous emotions}, pubstate = {published}, tppubtype = {article} } Close This paper presents our work on recording a multi-modal database containing emotional audio and video recordings. In designing the recording strategies a special attention was payed to gather data involving spontaneous emotions and therefore obtain a more realistic training and testing conditions for experiments. With specially planned scenarios including playing computer games and conducting an adaptive intelligence test different levels of arousal were induced. This will enable us to both detect different emotional states as well as experiment in speaker identification/verification of people involved in communications. So far the multi-modal database has been recorded and basic evaluation of the data was processed. Close https://lmi.fe.uni-lj.si/en/multi-modalemotionaldatabaseavid/ Close
2008
Gajšek, Rok; Podlesek, Anja; Komidar, Luka; Sočan, Grekor; Bajec, Boštjan; Štruc, Vitomir; Bucik, Valentin; Mihelič, France AvID: audio-video emotional database Proceedings Article In: Proceedings of the 11th International Multi-conference Information Society (IS'08), pp. 70-74, Ljubljana, Slovenia, 2008. BibTeX \| Tags: database, dataset, emotion recognition, facial expression recognition, multimodal database, speech technology, spontaneous emotions @inproceedings{JJ2008, title = {AvID: audio-video emotional database}, author = {Rok Gajšek and Anja Podlesek and Luka Komidar and Grekor Sočan and Boštjan Bajec and Vitomir Štruc and Valentin Bucik and France Mihelič}, year = {2008}, date = {2008-01-01}, booktitle = {Proceedings of the 11th International Multi-conference Information Society (IS'08)}, volume = {C}, pages = {70-74}, address = {Ljubljana, Slovenia}, keywords = {database, dataset, emotion recognition, facial expression recognition, multimodal database, speech technology, spontaneous emotions}, pubstate = {published}, tppubtype = {inproceedings} } Close