2018
|
Križaj, Janez; Emeršič, Žiga; Dobrišek, Simon; Peer, Peter; Štruc, Vitomir Localization of Facial Landmarks in Depth Images Using Gated Multiple Ridge Descent Inproceedings In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–8, IEEE 2018. @inproceedings{krivzaj2018localization,
title = {Localization of Facial Landmarks in Depth Images Using Gated Multiple Ridge Descent},
author = {Janez Križaj and Žiga Emeršič and Simon Dobrišek and Peter Peer and Vitomir Štruc},
url = {https://ieeexplore.ieee.org/abstract/document/8464215},
year = {2018},
date = {2018-09-01},
booktitle = {2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI)},
pages = {1--8},
organization = {IEEE},
abstract = {A novel method for automatic facial landmark localization is presented. The method builds on the supervised descent framework, which was shown to successfully localize landmarks in the presence of large expression variations and mild occlusions, but struggles when localizing landmarks on faces with large pose variations. We propose an extension of the supervised descent framework that trains multiple descent maps and results in increased robustness to pose variations. The performance of the proposed method is demonstrated on the Bosphorus, the FRGC and the UND data sets for the problem of facial landmark localization from 3D data. Our experimental results show that the proposed method exhibits increased robustness to pose variations, while retaining high performance in the case of expression and occlusion variations.},
keywords = {3d face, 3d landmarking, face alignment, face landmarking, gated ridge descent},
pubstate = {published},
tppubtype = {inproceedings}
}
A novel method for automatic facial landmark localization is presented. The method builds on the supervised descent framework, which was shown to successfully localize landmarks in the presence of large expression variations and mild occlusions, but struggles when localizing landmarks on faces with large pose variations. We propose an extension of the supervised descent framework that trains multiple descent maps and results in increased robustness to pose variations. The performance of the proposed method is demonstrated on the Bosphorus, the FRGC and the UND data sets for the problem of facial landmark localization from 3D data. Our experimental results show that the proposed method exhibits increased robustness to pose variations, while retaining high performance in the case of expression and occlusion variations. |
Kristan, Matej; Leonardis, Ales; Matas, Jiri; Felsberg, Michael; Pflugfelder, Roman; Zajc, Luka Cehovin; Vojir, Tomas; Bhat, Goutam; Lukezic, Alan; Eldesokey, Abdelrahman; Štruc, Vitomir; Grm, Klemen; others, The sixth visual object tracking VOT2018 challenge results Inproceedings In: European Conference on Computer Vision Workshops (ECCV-W 2018), 2018. @inproceedings{kristan2018sixth,
title = {The sixth visual object tracking VOT2018 challenge results},
author = {Matej Kristan and Ales Leonardis and Jiri Matas and Michael Felsberg and Roman Pflugfelder and Luka Cehovin Zajc and Tomas Vojir and Goutam Bhat and Alan Lukezic and Abdelrahman Eldesokey and Vitomir Štruc and Klemen Grm and others},
url = {http://openaccess.thecvf.com/content_ECCVW_2018/papers/11129/Kristan_The_sixth_Visual_Object_Tracking_VOT2018_challenge_results_ECCVW_2018_paper.pdf},
year = {2018},
date = {2018-09-01},
booktitle = {European Conference on Computer Vision Workshops (ECCV-W 2018)},
abstract = {The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new longterm tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website.},
keywords = {benchmark, tracking, VOT},
pubstate = {published},
tppubtype = {inproceedings}
}
The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new longterm tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website. |
Rot, Peter; Emeršič, Žiga; Struc, Vitomir; Peer, Peter Deep multi-class eye segmentation for ocular biometrics Inproceedings In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–8, IEEE 2018. @inproceedings{rot2018deep,
title = {Deep multi-class eye segmentation for ocular biometrics},
author = {Peter Rot and Žiga Emeršič and Vitomir Struc and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/MultiClassReduced.pdf},
year = {2018},
date = {2018-07-01},
booktitle = {2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI)},
pages = {1--8},
organization = {IEEE},
abstract = {Segmentation techniques for ocular biometrics typically focus on finding a single eye region in the input image at the time. Only limited work has been done on multi-class eye segmentation despite a number of obvious advantages. In this paper we address this gap and present a deep multi-class eye segmentation model build around the SegNet architecture. We train the model on a small dataset (of 120 samples) of eye images and observe it to generalize well to unseen images and to ensure highly accurate segmentation results. We evaluate the model on the Multi-Angle Sclera Database (MASD) dataset and describe comprehensive experiments focusing on: i) segmentation performance, ii) error analysis, iii) the sensitivity of the model to changes in view direction, and iv) comparisons with competing single-class techniques. Our results show that the proposed model is viable solution for multi-class eye segmentation suitable for recognition (multi-biometric) pipelines based on ocular characteristics.},
keywords = {biometrics, eye, ocular, sclera, segmentation},
pubstate = {published},
tppubtype = {inproceedings}
}
Segmentation techniques for ocular biometrics typically focus on finding a single eye region in the input image at the time. Only limited work has been done on multi-class eye segmentation despite a number of obvious advantages. In this paper we address this gap and present a deep multi-class eye segmentation model build around the SegNet architecture. We train the model on a small dataset (of 120 samples) of eye images and observe it to generalize well to unseen images and to ensure highly accurate segmentation results. We evaluate the model on the Multi-Angle Sclera Database (MASD) dataset and describe comprehensive experiments focusing on: i) segmentation performance, ii) error analysis, iii) the sensitivity of the model to changes in view direction, and iv) comparisons with competing single-class techniques. Our results show that the proposed model is viable solution for multi-class eye segmentation suitable for recognition (multi-biometric) pipelines based on ocular characteristics. |
Lozej, Juš; Meden, Blaž; Struc, Vitomir; Peer, Peter End-to-end iris segmentation using U-Net Inproceedings In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–6, IEEE 2018. @inproceedings{lozej2018end,
title = {End-to-end iris segmentation using U-Net},
author = {Juš Lozej and Blaž Meden and Vitomir Struc and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/IWOBI_2018_paper_15.pdf},
year = {2018},
date = {2018-07-01},
booktitle = {2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI)},
pages = {1--6},
organization = {IEEE},
abstract = {Iris segmentation is an important research topic that received significant attention from the research community over the years. Traditional iris segmentation techniques have typically been focused on hand-crafted procedures that, nonetheless, achieved remarkable segmentation performance even with images captured in difficult settings. With the success of deep-learning models, researchers are increasingly looking towards convolutional neural networks (CNNs) to further improve on the accuracy of existing iris segmentation techniques and several CNN-based techniques have already been presented recently in the literature. In this paper we also consider deep-learning models for iris segmentation and present an iris segmentation approach based on the popular U-Net architecture. Our model is trainable end-to-end and, hence, avoids the need for hand designing the segmentation procedure. We evaluate the model on the CASIA dataset and report encouraging results in comparison to existing techniques used in this area.},
keywords = {biometrics, CNN, convolutional neural networks, iris, ocular, U-net},
pubstate = {published},
tppubtype = {inproceedings}
}
Iris segmentation is an important research topic that received significant attention from the research community over the years. Traditional iris segmentation techniques have typically been focused on hand-crafted procedures that, nonetheless, achieved remarkable segmentation performance even with images captured in difficult settings. With the success of deep-learning models, researchers are increasingly looking towards convolutional neural networks (CNNs) to further improve on the accuracy of existing iris segmentation techniques and several CNN-based techniques have already been presented recently in the literature. In this paper we also consider deep-learning models for iris segmentation and present an iris segmentation approach based on the popular U-Net architecture. Our model is trainable end-to-end and, hence, avoids the need for hand designing the segmentation procedure. We evaluate the model on the CASIA dataset and report encouraging results in comparison to existing techniques used in this area. |
Meden, Blaz; Peer, Peter; Struc, Vitomir Selective Face Deidentification with End-to-End Perceptual Loss Learning Inproceedings In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–7, IEEE 2018. @inproceedings{meden2018selective,
title = {Selective Face Deidentification with End-to-End Perceptual Loss Learning},
author = {Blaz Meden and Peter Peer and Vitomir Struc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/Selective_Face_Deidentification_with_End_to_End_Perceptual_Loss_Learning.pdf},
year = {2018},
date = {2018-06-01},
booktitle = {2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI)},
pages = {1--7},
organization = {IEEE},
abstract = {Privacy is a highly debatable topic in the modern technological era. With the advent of massive video and image data (which in a lot of cases contains personal information on the recorded subjects), there is an imminent need for efficient privacy protection mechanisms. To this end, we develop in this work a novel Face Deidentification Network (FaDeNet) that is able to alter the input faces in such a way that automated recognition fail to recognize the subjects in the images, while this is still possible for human observers. FaDeNet is based an encoder-decoder architecture that is trained to auto-encode the input image, while (at the same time) minimizing the recognition performance of a secondary network that is used as an socalled identity critic in FaDeNet. We present experiments on the Radbound Faces Dataset and observe encouraging results.},
keywords = {deidentification, face, face deidentification, privacy protection},
pubstate = {published},
tppubtype = {inproceedings}
}
Privacy is a highly debatable topic in the modern technological era. With the advent of massive video and image data (which in a lot of cases contains personal information on the recorded subjects), there is an imminent need for efficient privacy protection mechanisms. To this end, we develop in this work a novel Face Deidentification Network (FaDeNet) that is able to alter the input faces in such a way that automated recognition fail to recognize the subjects in the images, while this is still possible for human observers. FaDeNet is based an encoder-decoder architecture that is trained to auto-encode the input image, while (at the same time) minimizing the recognition performance of a secondary network that is used as an socalled identity critic in FaDeNet. We present experiments on the Radbound Faces Dataset and observe encouraging results. |
Grm, Klemen; Štruc, Vitomir Deep face recognition for surveillance applications Journal Article In: IEEE Intelligent Systems, vol. 33, no. 3, pp. 46–50, 2018. @article{GrmIEEE2018,
title = {Deep face recognition for surveillance applications},
author = {Klemen Grm and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/UniversityOfLjubljana_IEEE_IS_Submission.pdf},
year = {2018},
date = {2018-05-01},
journal = {IEEE Intelligent Systems},
volume = {33},
number = {3},
pages = {46--50},
abstract = {Automated person recognition from surveillance quality footage is an open research problem with many potential application areas. In this paper, we aim at addressing this problem by presenting a face recognition approach tailored towards surveillance applications. The presented approach is based on domain-adapted convolutional neural networks and ranked second in the International Challenge on Biometric Recognition in the Wild (ICB-RW) 2016. We evaluate the performance of the presented approach on part of the Quis-Campi dataset and compare it against several existing face recognition techniques and one (state-of-the-art) commercial system. We find that the domain-adapted convolutional network outperforms all other assessed techniques, but is still inferior to human performance.},
keywords = {biometrics, face, face recognition, performance evaluation, surveillance},
pubstate = {published},
tppubtype = {article}
}
Automated person recognition from surveillance quality footage is an open research problem with many potential application areas. In this paper, we aim at addressing this problem by presenting a face recognition approach tailored towards surveillance applications. The presented approach is based on domain-adapted convolutional neural networks and ranked second in the International Challenge on Biometric Recognition in the Wild (ICB-RW) 2016. We evaluate the performance of the presented approach on part of the Quis-Campi dataset and compare it against several existing face recognition techniques and one (state-of-the-art) commercial system. We find that the domain-adapted convolutional network outperforms all other assessed techniques, but is still inferior to human performance. |
Emeršič, Žiga; Meden, Blaž; Peer, Peter; Štruc, Vitomir Evaluation and analysis of ear recognition models: performance, complexity and resource requirements Journal Article In: Neural Computing and Applications, pp. 1–16, 2018, ISBN: 0941-0643. @article{emervsivc2018evaluation,
title = {Evaluation and analysis of ear recognition models: performance, complexity and resource requirements},
author = {Žiga Emeršič and Blaž Meden and Peter Peer and Vitomir Štruc},
url = {https://rdcu.be/Os7a},
doi = {https://doi.org/10.1007/s00521-018-3530-1},
isbn = {0941-0643},
year = {2018},
date = {2018-05-01},
journal = {Neural Computing and Applications},
pages = {1--16},
publisher = {Springer},
abstract = {Ear recognition technology has long been dominated by (local) descriptor-based techniques due to their formidable recognition performance and robustness to various sources of image variability. While deep-learning-based techniques have started to appear in this field only recently, they have already shown potential for further boosting the performance of ear recognition technology and dethroning descriptor-based methods as the current state of the art. However, while recognition performance is often the key factor when selecting recognition models for biometric technology, it is equally important that the behavior of the models is understood and their sensitivity to different covariates is known and well explored. Other factors, such as the train- and test-time complexity or resource requirements, are also paramount and need to be consider when designing recognition systems. To explore these issues, we present in this paper a comprehensive analysis of several descriptor- and deep-learning-based techniques for ear recognition. Our goal is to discover weak points of contemporary techniques, study the characteristics of the existing technology and identify open problems worth exploring in the future. We conduct our analysis through identification experiments on the challenging Annotated Web Ears (AWE) dataset and report our findings. The results of our analysis show that the presence of accessories and high degrees of head movement significantly impacts the identification performance of all types of recognition models, whereas mild degrees of the listed factors and other covariates such as gender and ethnicity impact the identification performance only to a limited extent. From a test-time-complexity point of view, the results suggest that lightweight deep models can be equally fast as descriptor-based methods given appropriate computing hardware, but require significantly more resources during training, where descriptor-based methods have a clear advantage. As an additional contribution, we also introduce a novel dataset of ear images, called AWE Extended (AWEx), which we collected from the web for the training of the deep models used in our experiments. AWEx contains 4104 images of 346 subjects and represents one of the largest and most challenging (publicly available) datasets of unconstrained ear images at the disposal of the research community.},
keywords = {AWE, AWEx, descriptor methods, ear recognition, extended annotated web ears dataset},
pubstate = {published},
tppubtype = {article}
}
Ear recognition technology has long been dominated by (local) descriptor-based techniques due to their formidable recognition performance and robustness to various sources of image variability. While deep-learning-based techniques have started to appear in this field only recently, they have already shown potential for further boosting the performance of ear recognition technology and dethroning descriptor-based methods as the current state of the art. However, while recognition performance is often the key factor when selecting recognition models for biometric technology, it is equally important that the behavior of the models is understood and their sensitivity to different covariates is known and well explored. Other factors, such as the train- and test-time complexity or resource requirements, are also paramount and need to be consider when designing recognition systems. To explore these issues, we present in this paper a comprehensive analysis of several descriptor- and deep-learning-based techniques for ear recognition. Our goal is to discover weak points of contemporary techniques, study the characteristics of the existing technology and identify open problems worth exploring in the future. We conduct our analysis through identification experiments on the challenging Annotated Web Ears (AWE) dataset and report our findings. The results of our analysis show that the presence of accessories and high degrees of head movement significantly impacts the identification performance of all types of recognition models, whereas mild degrees of the listed factors and other covariates such as gender and ethnicity impact the identification performance only to a limited extent. From a test-time-complexity point of view, the results suggest that lightweight deep models can be equally fast as descriptor-based methods given appropriate computing hardware, but require significantly more resources during training, where descriptor-based methods have a clear advantage. As an additional contribution, we also introduce a novel dataset of ear images, called AWE Extended (AWEx), which we collected from the web for the training of the deep models used in our experiments. AWEx contains 4104 images of 346 subjects and represents one of the largest and most challenging (publicly available) datasets of unconstrained ear images at the disposal of the research community. |
Banerjee, Sandipan; Brogan, Joel; Krizaj, Janez; Bharati, Aparna; RichardWebster, Brandon; Struc, Vitomir; Flynn, Patrick J.; Scheirer, Walter J. To frontalize or not to frontalize: Do we really need elaborate pre-processing to improve face recognition? Inproceedings In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 20–29, IEEE 2018. @inproceedings{banerjee2018frontalize,
title = {To frontalize or not to frontalize: Do we really need elaborate pre-processing to improve face recognition?},
author = {Sandipan Banerjee and Joel Brogan and Janez Krizaj and Aparna Bharati and Brandon RichardWebster and Vitomir Struc and Patrick J. Flynn and Walter J. Scheirer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/To_Frontalize_or_Not_To_Frontalize_Do_We_Really_Ne.pdf},
year = {2018},
date = {2018-05-01},
booktitle = {2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
pages = {20--29},
organization = {IEEE},
abstract = {Face recognition performance has improved remarkably in the last decade. Much of this success can be attributed to the development of deep learning techniques such as convolutional neural networks (CNNs). While CNNs have pushed the state-of-the-art forward, their training process requires a large amount of clean and correctly labelled training data. If a CNN is intended to tolerate facial pose, then we face an important question: should this training data be diverse in its pose distribution, or should face images be normalized to a single pose in a pre-processing step? To address this question, we evaluate a number of facial landmarking algorithms and a popular frontalization method to understand their effect on facial recognition performance. Additionally, we introduce a new, automatic, single-image frontalization scheme that exceeds the performance of the reference frontalization algorithm for video-to-video face matching on the Point and Shoot Challenge (PaSC) dataset. Additionally, we investigate failure modes of each frontalization method on different facial yaw using the CMU Multi-PIE dataset. We assert that the subsequent recognition and verification performance serves to quantify the effectiveness of each pose correction scheme.},
keywords = {face alignment, face recognition, landmarking},
pubstate = {published},
tppubtype = {inproceedings}
}
Face recognition performance has improved remarkably in the last decade. Much of this success can be attributed to the development of deep learning techniques such as convolutional neural networks (CNNs). While CNNs have pushed the state-of-the-art forward, their training process requires a large amount of clean and correctly labelled training data. If a CNN is intended to tolerate facial pose, then we face an important question: should this training data be diverse in its pose distribution, or should face images be normalized to a single pose in a pre-processing step? To address this question, we evaluate a number of facial landmarking algorithms and a popular frontalization method to understand their effect on facial recognition performance. Additionally, we introduce a new, automatic, single-image frontalization scheme that exceeds the performance of the reference frontalization algorithm for video-to-video face matching on the Point and Shoot Challenge (PaSC) dataset. Additionally, we investigate failure modes of each frontalization method on different facial yaw using the CMU Multi-PIE dataset. We assert that the subsequent recognition and verification performance serves to quantify the effectiveness of each pose correction scheme. |
Emeršič, Žiga; Gabriel, Luka; Štruc, Vitomir; Peer, Peter Convolutional encoder--decoder networks for pixel-wise ear detection and segmentation Journal Article In: IET Biometrics, vol. 7, no. 3, pp. 175–184, 2018. @article{emervsivc2018convolutional,
title = {Convolutional encoder--decoder networks for pixel-wise ear detection and segmentation},
author = {Žiga Emeršič and Luka Gabriel and Vitomir Štruc and Peter Peer},
url = {https://arxiv.org/pdf/1702.00307.pdf},
year = {2018},
date = {2018-03-01},
journal = {IET Biometrics},
volume = {7},
number = {3},
pages = {175--184},
publisher = {IET},
abstract = {Object detection and segmentation represents the basis for many tasks in computer and machine vision. In biometric recognition systems the detection of the region-of-interest (ROI) is one of the most crucial steps in the processing pipeline, significantly impacting the performance of the entire recognition system. Existing approaches to ear detection, are commonly susceptible to the presence of severe occlusions, ear accessories or variable illumination conditions and often deteriorate in their performance if applied on ear images captured in unconstrained settings. To address these shortcomings, we present a novel ear detection technique based on convolutional encoder-decoder networks (CEDs). We formulate the problem of ear detection as a two-class segmentation problem and design and train a CED-network architecture to distinguish between image-pixels belonging to the ear and the non-ear class. Unlike competing techniques, our approach does not simply return a bounding box around the detected ear, but provides detailed, pixel-wise information about the location of the ears in the image. Experiments on a dataset gathered from the web (a.k.a. in the wild) show that the proposed technique ensures good detection results in the presence of various covariate factors and significantly outperforms competing methods from the literature.},
keywords = {annotated web ears, AWE, biometrics, ear, ear detection, pixel-wise detection, segmentation},
pubstate = {published},
tppubtype = {article}
}
Object detection and segmentation represents the basis for many tasks in computer and machine vision. In biometric recognition systems the detection of the region-of-interest (ROI) is one of the most crucial steps in the processing pipeline, significantly impacting the performance of the entire recognition system. Existing approaches to ear detection, are commonly susceptible to the presence of severe occlusions, ear accessories or variable illumination conditions and often deteriorate in their performance if applied on ear images captured in unconstrained settings. To address these shortcomings, we present a novel ear detection technique based on convolutional encoder-decoder networks (CEDs). We formulate the problem of ear detection as a two-class segmentation problem and design and train a CED-network architecture to distinguish between image-pixels belonging to the ear and the non-ear class. Unlike competing techniques, our approach does not simply return a bounding box around the detected ear, but provides detailed, pixel-wise information about the location of the ears in the image. Experiments on a dataset gathered from the web (a.k.a. in the wild) show that the proposed technique ensures good detection results in the presence of various covariate factors and significantly outperforms competing methods from the literature. |
Emeršič, Žiga; Playa, Nil Oleart; Štruc, Vitomir; Peer, Peter Towards Accessories-Aware Ear Recognition Inproceedings In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–8, IEEE 2018. @inproceedings{emervsivc2018towards,
title = {Towards Accessories-Aware Ear Recognition},
author = {Žiga Emeršič and Nil Oleart Playa and Vitomir Štruc and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/iwobi-2018-inpaint-1.pdf},
doi = {10.1109/IWOBI.2018.8464138},
year = {2018},
date = {2018-03-01},
booktitle = {2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI)},
pages = {1--8},
organization = {IEEE},
abstract = {Automatic ear recognition is gaining popularity within the research community due to numerous desirable properties, such as high recognition performance, the possibility of capturing ear images at a distance and in a covert manner, etc. Despite this popularity and the corresponding research effort that is being directed towards ear recognition technology, open problems still remain. One of the most important issues stopping ear recognition systems from being widely available are ear occlusions and accessories. Ear accessories not only mask biometric features and by this reduce the overall recognition performance, but also introduce new non-biometric features that can be exploited for spoofing purposes. Ignoring ear accessories during recognition can, therefore, present a security threat to ear recognition and also adversely affect performance. Despite the importance of this topic there has been, to the best of our knowledge, no ear recognition studies that would address these problems. In this work we try to close this gap and study the impact of ear accessories on the recognition performance of several state-of-the-art ear recognition techniques. We consider ear accessories as a tool for spoofing attacks and show that CNN-based recognition approaches are more susceptible to spoofing attacks than traditional descriptor-based approaches. Furthermore, we demonstrate that using inpainting techniques or average coloring can mitigate the problems caused by ear accessories and slightly outperforms (standard) black color to mask ear accessories.},
keywords = {accessories, biometrics, ear recognition},
pubstate = {published},
tppubtype = {inproceedings}
}
Automatic ear recognition is gaining popularity within the research community due to numerous desirable properties, such as high recognition performance, the possibility of capturing ear images at a distance and in a covert manner, etc. Despite this popularity and the corresponding research effort that is being directed towards ear recognition technology, open problems still remain. One of the most important issues stopping ear recognition systems from being widely available are ear occlusions and accessories. Ear accessories not only mask biometric features and by this reduce the overall recognition performance, but also introduce new non-biometric features that can be exploited for spoofing purposes. Ignoring ear accessories during recognition can, therefore, present a security threat to ear recognition and also adversely affect performance. Despite the importance of this topic there has been, to the best of our knowledge, no ear recognition studies that would address these problems. In this work we try to close this gap and study the impact of ear accessories on the recognition performance of several state-of-the-art ear recognition techniques. We consider ear accessories as a tool for spoofing attacks and show that CNN-based recognition approaches are more susceptible to spoofing attacks than traditional descriptor-based approaches. Furthermore, we demonstrate that using inpainting techniques or average coloring can mitigate the problems caused by ear accessories and slightly outperforms (standard) black color to mask ear accessories. |
Vidal, Rosaura G.; Banerjee, Sreya; Grm, Klemen; Struc, Vitomir; Scheirer, Walter J. UG^ 2: A Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition Inproceedings In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1597–1606, IEEE 2018. @inproceedings{vidal2018ug,
title = {UG^ 2: A Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition},
author = {Rosaura G. Vidal and Sreya Banerjee and Klemen Grm and Vitomir Struc and Walter J. Scheirer},
url = {https://arxiv.org/pdf/1710.02909.pdf},
year = {2018},
date = {2018-02-01},
booktitle = {2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
pages = {1597--1606},
organization = {IEEE},
abstract = {Advances in image restoration and enhancement techniques have led to discussion about how such algorithms can be applied as a pre-processing step to improve automatic visual recognition. In principle, techniques like deblurring and super-resolution should yield improvements by de-emphasizing noise and increasing signal in an input image. But the historically divergent goals of computational photography and visual recognition communities have created a significant need for more work in this direction. To facilitate new research, we introduce a new benchmark dataset called UG2, which contains three difficult real-world scenarios: uncontrolled videos taken by UAVs and manned gliders, as well as controlled videos taken on the ground. Over 150,000 annotated frames for hundreds of ImageNet classes are available, which are used for baseline experiments that assess the impact of known and unknown image artifacts and other conditions on common deep learning-based object classification approaches. Further, current image restoration and enhancement techniques are evaluated by determining whether or not they improve baseline classification performance. Results show that there is plenty of room for algorithmic innovation, making this dataset a useful tool going forward.},
keywords = {benchmark, computational photography, image enhancement, image restoration, UAV, UG2, visual recognition},
pubstate = {published},
tppubtype = {inproceedings}
}
Advances in image restoration and enhancement techniques have led to discussion about how such algorithms can be applied as a pre-processing step to improve automatic visual recognition. In principle, techniques like deblurring and super-resolution should yield improvements by de-emphasizing noise and increasing signal in an input image. But the historically divergent goals of computational photography and visual recognition communities have created a significant need for more work in this direction. To facilitate new research, we introduce a new benchmark dataset called UG2, which contains three difficult real-world scenarios: uncontrolled videos taken by UAVs and manned gliders, as well as controlled videos taken on the ground. Over 150,000 annotated frames for hundreds of ImageNet classes are available, which are used for baseline experiments that assess the impact of known and unknown image artifacts and other conditions on common deep learning-based object classification approaches. Further, current image restoration and enhancement techniques are evaluated by determining whether or not they improve baseline classification performance. Results show that there is plenty of room for algorithmic innovation, making this dataset a useful tool going forward. |
Das, Abhijit; Pal, Umapada; Ferrer, Miguel A.; Blumenstein, Michael; Štepec, Dejan; Rot, Peter; Emeršič, Žiga; Peer, Peter; Štruc, Vitomir SSBC 2018: Sclera Segmentation Benchmarking Competition Inproceedings In: 2018 International Conference on Biometrics (ICB), 2018. @inproceedings{Dasicb2018,
title = {SSBC 2018: Sclera Segmentation Benchmarking Competition},
author = {Abhijit Das and Umapada Pal and Miguel A. Ferrer and Michael Blumenstein and Dejan Štepec and Peter Rot and Žiga Emeršič and Peter Peer and Vitomir Štruc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/icb2018_sserbc.pdf},
year = {2018},
date = {2018-02-01},
booktitle = {2018 International Conference on Biometrics (ICB)},
abstract = {This paper summarises the results of the Sclera Segmentation Benchmarking Competition (SSBC 2018). It was organised in the context of the 11th IAPR International Conference on Biometrics (ICB 2018). The aim of this competition was to record the developments on sclera segmentation in the cross-sensor environment (sclera trait captured using multiple acquiring sensors). Additionally, the competition also aimed to gain the attention of researchers on this subject of research. For the purpose of benchmarking, we have developed two datasets of sclera images captured using different sensors. The first dataset was collected using a DSLR camera and the second one was collected using a mobile phone camera. The first dataset is the Multi-Angle Sclera Dataset (MASD version 1), which was used in the context of the previous versions of sclera segmentation competitions. The images in the second dataset were captured using .a mobile phone rear camera of 8-megapixel. As baseline manual segmentation mask of the sclera images from both the datasets were developed. Precision and recall-based statistical measures were employed to evaluate the effectiveness of the submitted segmentation technique and to rank them. Six algorithms were submitted towards the segmentation task. This paper analyses the results produced by these algorithms/system and defines a way forward for this subject of research. Both the datasets along with some of the accompanying ground truth/baseline mask will be freely available for research purposes upon request to authors by email.},
keywords = {competition, ocular, sclera, sclera segmentation},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper summarises the results of the Sclera Segmentation Benchmarking Competition (SSBC 2018). It was organised in the context of the 11th IAPR International Conference on Biometrics (ICB 2018). The aim of this competition was to record the developments on sclera segmentation in the cross-sensor environment (sclera trait captured using multiple acquiring sensors). Additionally, the competition also aimed to gain the attention of researchers on this subject of research. For the purpose of benchmarking, we have developed two datasets of sclera images captured using different sensors. The first dataset was collected using a DSLR camera and the second one was collected using a mobile phone camera. The first dataset is the Multi-Angle Sclera Dataset (MASD version 1), which was used in the context of the previous versions of sclera segmentation competitions. The images in the second dataset were captured using .a mobile phone rear camera of 8-megapixel. As baseline manual segmentation mask of the sclera images from both the datasets were developed. Precision and recall-based statistical measures were employed to evaluate the effectiveness of the submitted segmentation technique and to rank them. Six algorithms were submitted towards the segmentation task. This paper analyses the results produced by these algorithms/system and defines a way forward for this subject of research. Both the datasets along with some of the accompanying ground truth/baseline mask will be freely available for research purposes upon request to authors by email. |
Meden, Blaž; Emeršič, Žiga; Štruc, Vitomir; Peer, Peter k-Same-Net: k-Anonymity with Generative Deep Neural Networks for Face Deidentification Journal Article In: Entropy, vol. 20, no. 1, pp. 60, 2018. @article{meden2018k,
title = {k-Same-Net: k-Anonymity with Generative Deep Neural Networks for Face Deidentification},
author = {Blaž Meden and Žiga Emeršič and Vitomir Štruc and Peter Peer},
url = {https://www.mdpi.com/1099-4300/20/1/60/pdf},
year = {2018},
date = {2018-01-01},
journal = {Entropy},
volume = {20},
number = {1},
pages = {60},
publisher = {Multidisciplinary Digital Publishing Institute},
abstract = {Image and video data are today being shared between government entities and other relevant stakeholders on a regular basis and require careful handling of the personal information contained therein. A popular approach to ensure privacy protection in such data is the use of deidentification techniques, which aim at concealing the identity of individuals in the imagery while still preserving certain aspects of the data after deidentification. In this work, we propose a novel approach towards face deidentification, called k-Same-Net, which combines recent Generative Neural Networks (GNNs) with the well-known k-Anonymitymechanism and provides formal guarantees regarding privacy protection on a closed set of identities. Our GNN is able to generate synthetic surrogate face images for deidentification by seamlessly combining features of identities used to train the GNN model. Furthermore, it allows us to control the image-generation process with a small set of appearance-related parameters that can be used to alter specific aspects (e.g., facial expressions, age, gender) of the synthesized surrogate images. We demonstrate the feasibility of k-Same-Net in comprehensive experiments on the XM2VTS and CK+ datasets. We evaluate the efficacy of the proposed approach through reidentification experiments with recent recognition models and compare our results with competing deidentification techniques from the literature. We also present facial expression recognition experiments to demonstrate the utility-preservation capabilities of k-Same-Net. Our experimental results suggest that k-Same-Net is a viable option for facial deidentification that exhibits several desirable characteristics when compared to existing solutions in this area.},
keywords = {deidentification, face, k-same, k-same-net, privacy protection},
pubstate = {published},
tppubtype = {article}
}
Image and video data are today being shared between government entities and other relevant stakeholders on a regular basis and require careful handling of the personal information contained therein. A popular approach to ensure privacy protection in such data is the use of deidentification techniques, which aim at concealing the identity of individuals in the imagery while still preserving certain aspects of the data after deidentification. In this work, we propose a novel approach towards face deidentification, called k-Same-Net, which combines recent Generative Neural Networks (GNNs) with the well-known k-Anonymitymechanism and provides formal guarantees regarding privacy protection on a closed set of identities. Our GNN is able to generate synthetic surrogate face images for deidentification by seamlessly combining features of identities used to train the GNN model. Furthermore, it allows us to control the image-generation process with a small set of appearance-related parameters that can be used to alter specific aspects (e.g., facial expressions, age, gender) of the synthesized surrogate images. We demonstrate the feasibility of k-Same-Net in comprehensive experiments on the XM2VTS and CK+ datasets. We evaluate the efficacy of the proposed approach through reidentification experiments with recent recognition models and compare our results with competing deidentification techniques from the literature. We also present facial expression recognition experiments to demonstrate the utility-preservation capabilities of k-Same-Net. Our experimental results suggest that k-Same-Net is a viable option for facial deidentification that exhibits several desirable characteristics when compared to existing solutions in this area. |
Šket, Robert; Debevec, Tadej; Kublik, Susanne; Schloter, Michael; Schoeller, Anne; Murovec, Boštjan; Mikuš, Katarina Vogel; Makuc, Damjan; Pečnik, Klemen; Plavec, Janez; Mekjavić, Igor B; Eiken, Ola; Prevoršek, Zala; Stres, Blaž Intestinal Metagenomes and Metabolomes in Healthy Young Males: Inactivity and Hypoxia Generated Negative Physiological Symptoms Precede Microbial Dysbiosis Journal Article In: Frontiers in Physiology, vol. 9, pp. 198, 2018, ISSN: 1664-042X. @article{10.3389/fphys.2018.00198,
title = {Intestinal Metagenomes and Metabolomes in Healthy Young Males: Inactivity and Hypoxia Generated Negative Physiological Symptoms Precede Microbial Dysbiosis},
author = {Robert Šket and Tadej Debevec and Susanne Kublik and Michael Schloter and Anne Schoeller and Boštjan Murovec and Katarina Vogel Mikuš and Damjan Makuc and Klemen Pečnik and Janez Plavec and Igor B Mekjavić and Ola Eiken and Zala Prevoršek and Blaž Stres},
url = {https://www.frontiersin.org/article/10.3389/fphys.2018.00198},
doi = {10.3389/fphys.2018.00198},
issn = {1664-042X},
year = {2018},
date = {2018-01-01},
journal = {Frontiers in Physiology},
volume = {9},
pages = {198},
abstract = {We explored the metagenomic, metabolomic and trace metal makeup of intestinal microbiota and environment in healthy male participants during the run-in (5 day) and the following three 21-day interventions: normoxic bedrest (NBR), hypoxic bedrest (HBR) and hypoxic ambulation (HAmb) which were carried out within a controlled laboratory environment (circadian rhythm, fluid and dietary intakes, microbial bioburden, oxygen level, exercise). The fraction of inspired O2 (FiO2) and partial pressure of inspired O2 (PiO2) were 0.209 and 133.1 ± 0.3 mmHg for the NBR and 0.141 ± 0.004 and 90.0 ± 0.4 mmHg (~4000 m simulated altitude) for HBR and HAmb interventions, respectively. Shotgun metagenomes were analyzed at various taxonomic and functional levels, 1H- and 13C -metabolomes were processed using standard quantitative and human expert approaches, whereas metals were assessed using X-ray fluorescence spectrometry. Inactivity and hypoxia resulted in a significant increase in the genus Bacteroides in HBR, in genes coding for proteins involved in iron acquisition and metabolism, cell wall, capsule, virulence, defense and mucin degradation, such as beta-galactosidase (EC3.2.1.23), α-L-fucosidase (EC3.2.1.51), Sialidase (EC3.2.1.18) and α-N-acetylglucosaminidase (EC3.2.1.50). In contrast, the microbial metabolomes, intestinal element and metal profiles, the diversity of bacterial, archaeal and fungal microbial communities were not significantly affected. The observed progressive decrease in defecation frequency and concomitant increase in the electrical conductivity (EC) preceded or took place in absence of significant changes at the taxonomic, functional gene, metabolome and intestinal metal profile levels. The fact that the genus Bacteroides and proteins involved in iron acquisition and metabolism, cell wall, capsule, virulence and mucin degradation were enriched at the end of HBR suggest that both constipation and EC decreased intestinal metal availability leading to modified expression of co-regulated genes in Bacteroides genomes. Bayesian network analysis was used to derive the first hierarchical model of initial inactivity mediated deconditioning steps over time. The PlanHab wash-out period corresponded to a profound life-style change (i.e. reintroduction of exercise) that resulted in stepwise amelioration of the negative physiological symptoms, indicating that exercise apparently prevented the crosstalk between the microbial physiology, mucin degradation and proinflammatory immune activities in the host.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We explored the metagenomic, metabolomic and trace metal makeup of intestinal microbiota and environment in healthy male participants during the run-in (5 day) and the following three 21-day interventions: normoxic bedrest (NBR), hypoxic bedrest (HBR) and hypoxic ambulation (HAmb) which were carried out within a controlled laboratory environment (circadian rhythm, fluid and dietary intakes, microbial bioburden, oxygen level, exercise). The fraction of inspired O2 (FiO2) and partial pressure of inspired O2 (PiO2) were 0.209 and 133.1 ± 0.3 mmHg for the NBR and 0.141 ± 0.004 and 90.0 ± 0.4 mmHg (~4000 m simulated altitude) for HBR and HAmb interventions, respectively. Shotgun metagenomes were analyzed at various taxonomic and functional levels, 1H- and 13C -metabolomes were processed using standard quantitative and human expert approaches, whereas metals were assessed using X-ray fluorescence spectrometry. Inactivity and hypoxia resulted in a significant increase in the genus Bacteroides in HBR, in genes coding for proteins involved in iron acquisition and metabolism, cell wall, capsule, virulence, defense and mucin degradation, such as beta-galactosidase (EC3.2.1.23), α-L-fucosidase (EC3.2.1.51), Sialidase (EC3.2.1.18) and α-N-acetylglucosaminidase (EC3.2.1.50). In contrast, the microbial metabolomes, intestinal element and metal profiles, the diversity of bacterial, archaeal and fungal microbial communities were not significantly affected. The observed progressive decrease in defecation frequency and concomitant increase in the electrical conductivity (EC) preceded or took place in absence of significant changes at the taxonomic, functional gene, metabolome and intestinal metal profile levels. The fact that the genus Bacteroides and proteins involved in iron acquisition and metabolism, cell wall, capsule, virulence and mucin degradation were enriched at the end of HBR suggest that both constipation and EC decreased intestinal metal availability leading to modified expression of co-regulated genes in Bacteroides genomes. Bayesian network analysis was used to derive the first hierarchical model of initial inactivity mediated deconditioning steps over time. The PlanHab wash-out period corresponded to a profound life-style change (i.e. reintroduction of exercise) that resulted in stepwise amelioration of the negative physiological symptoms, indicating that exercise apparently prevented the crosstalk between the microbial physiology, mucin degradation and proinflammatory immune activities in the host. |
Murovec, Boštjan; Makuc, Damjan; Repinc, Sabina Kolbl; Prevoršek, Zala; Zavec, Domen; Šket, Robert; Pečnik, Klemen; Plavec, Janez; Stres, Blaž 1H NMR metabolomics of microbial metabolites in the four MW agricultural biogas plant reactors: A case study of inhibition mirroring the acute rumen acidosis symptoms Journal Article In: Journal of Environmental Management, vol. 222, pp. 428 - 435, 2018, ISSN: 0301-4797. @article{MUROVEC2018428,
title = {1H NMR metabolomics of microbial metabolites in the four MW agricultural biogas plant reactors: A case study of inhibition mirroring the acute rumen acidosis symptoms},
author = {Boštjan Murovec and Damjan Makuc and Sabina Kolbl Repinc and Zala Prevoršek and Domen Zavec and Robert Šket and Klemen Pečnik and Janez Plavec and Blaž Stres},
url = {http://www.sciencedirect.com/science/article/pii/S0301479718305991},
doi = {https://doi.org/10.1016/j.jenvman.2018.05.068},
issn = {0301-4797},
year = {2018},
date = {2018-01-01},
journal = {Journal of Environmental Management},
volume = {222},
pages = {428 - 435},
abstract = {In this study, nuclear magnetic resonance (1H NMR) spectroscopic profiling was used to provide a more comprehensive view of microbial metabolites associated with poor reactor performance in a full-scale 4 MW mesophilic agricultural biogas plant under fully operational and also under inhibited conditions. Multivariate analyses were used to assess the significance of differences between reactors whereas artificial neural networks (ANN) were used to identify the key metabolites responsible for inhibition and their network of interaction. Based on the results of nm-MDS ordination the subsamples of each reactor were similar, but not identical, despite homogenization of the full-scale reactors before sampling. Hence, a certain extent of variability due to the size of the system under analysis was transferred into metabolome analysis. Multivariate analysis showed that fully active reactors were clustered separately from those containing inhibited reactor metabolites and were significantly different. Furthermore, the three distinct inhibited states were significantly different from each other. The inhibited metabolomes were enriched in acetate, caprylate, trimethylamine, thymine, pyruvate, alanine, xanthine and succinate. The differences in the metabolic fingerprint between inactive and fully active reactors observed in this study resembled closely the metabolites differentiating the (sub) acute rumen acidosis inflicted and healthy rumen metabolomes, creating thus favorable conditions for the growth and activity of pathogenic bacteria. The consistency of our data with those reported before for rumen ecosystems shows that 1H NMR based metabolomics is a reliable approach for the evaluation of metabolic events at full-scale biogas reactors.},
keywords = {1H NMR, Biogas plant, Chenomix, Metabolomics, Reactor inhibition},
pubstate = {published},
tppubtype = {article}
}
In this study, nuclear magnetic resonance (1H NMR) spectroscopic profiling was used to provide a more comprehensive view of microbial metabolites associated with poor reactor performance in a full-scale 4 MW mesophilic agricultural biogas plant under fully operational and also under inhibited conditions. Multivariate analyses were used to assess the significance of differences between reactors whereas artificial neural networks (ANN) were used to identify the key metabolites responsible for inhibition and their network of interaction. Based on the results of nm-MDS ordination the subsamples of each reactor were similar, but not identical, despite homogenization of the full-scale reactors before sampling. Hence, a certain extent of variability due to the size of the system under analysis was transferred into metabolome analysis. Multivariate analysis showed that fully active reactors were clustered separately from those containing inhibited reactor metabolites and were significantly different. Furthermore, the three distinct inhibited states were significantly different from each other. The inhibited metabolomes were enriched in acetate, caprylate, trimethylamine, thymine, pyruvate, alanine, xanthine and succinate. The differences in the metabolic fingerprint between inactive and fully active reactors observed in this study resembled closely the metabolites differentiating the (sub) acute rumen acidosis inflicted and healthy rumen metabolomes, creating thus favorable conditions for the growth and activity of pathogenic bacteria. The consistency of our data with those reported before for rumen ecosystems shows that 1H NMR based metabolomics is a reliable approach for the evaluation of metabolic events at full-scale biogas reactors. |
2017
|
Lavrič, Primož; Emeršič, Žiga; Meden, Blaž; Štruc, Vitomir; Peer, Peter Do it Yourself: Building a Low-Cost Iris Recognition System at Home Using Off-The-Shelf Components Inproceedings In: Electrotechnical and Computer Science Conference ERK 2017, 2017. @inproceedings{ERK2017,
title = {Do it Yourself: Building a Low-Cost Iris Recognition System at Home Using Off-The-Shelf Components},
author = {Primož Lavrič and Žiga Emeršič and Blaž Meden and Vitomir Štruc and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/lavricdo_it.pdf},
year = {2017},
date = {2017-09-01},
booktitle = {Electrotechnical and Computer Science Conference ERK 2017},
abstract = {Among the different biometric traits that can be used for person recognition, the human iris is generally consid-ered to be among the most accurate. However, despite a plethora of desirable characteristics, iris recognition is not widely as widely used as competing biometric modalities likely due to the high cost of existing commercial iris-recognition systems. In this paper we contribute towards the availability of low-cost iris recognition systems and present a prototype system built using off-the-shelf components. We describe the prototype device, the pipeline used for iris recognition, evaluate the performance of our solution on a small in-house dataset and discuss directions for future work. The current version of our prototype includes complete hardware and software implementations and has a combined bill-of-materials of 110 EUR.
},
keywords = {biometrics, iris, sensor design},
pubstate = {published},
tppubtype = {inproceedings}
}
Among the different biometric traits that can be used for person recognition, the human iris is generally consid-ered to be among the most accurate. However, despite a plethora of desirable characteristics, iris recognition is not widely as widely used as competing biometric modalities likely due to the high cost of existing commercial iris-recognition systems. In this paper we contribute towards the availability of low-cost iris recognition systems and present a prototype system built using off-the-shelf components. We describe the prototype device, the pipeline used for iris recognition, evaluate the performance of our solution on a small in-house dataset and discuss directions for future work. The current version of our prototype includes complete hardware and software implementations and has a combined bill-of-materials of 110 EUR.
|
Klemen, Grm; Simon, Dobrišek; Vitomir, Štruc Evaluating image superresolution algorithms for cross-resolution face recognition Inproceedings In: Proceedings of the Twenty-sixth International Electrotechnical and Computer Science Conference ERK 2017, 2017. @inproceedings{ERK2017Grm,
title = {Evaluating image superresolution algorithms for cross-resolution face recognition},
author = {Grm Klemen and Dobrišek Simon and Štruc Vitomir},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/review_submission.pdf},
year = {2017},
date = {2017-09-01},
booktitle = {Proceedings of the Twenty-sixth International Electrotechnical and Computer Science Conference ERK 2017},
abstract = {With recent advancements in deep learning and convolutional neural networks (CNNs), face recognition has seen significant performance improvements over the last few years. However, low-resolution images still remain challenging, with CNNs performing relatively poorly compared to humans. One possibility to improve performance in these settings often advocated in the literature is the use of super-resolution (SR). In this paper, we explore the usefulness of SR algorithms for cross-resolution face recognition in experiments on the Labeled Faces in the Wild (LFW) and SCface datasets using four recent deep CNN models. We conduct experiments with synthetically down-sampled images as well as real-life low-resolution imagery captured by surveillance cameras. Our experiments show that image super-resolution can improve face recognition performance considerably on very low-resolution images (of size 24 x 24 or 32 x 32 pixels), when images are artificially down-sampled, but has a lesser (or sometimes even a detrimental) effect with real-life images leaving significant room for further research in this area.},
keywords = {face, face hallucination, face recognition, performance evaluation, super-resolution},
pubstate = {published},
tppubtype = {inproceedings}
}
With recent advancements in deep learning and convolutional neural networks (CNNs), face recognition has seen significant performance improvements over the last few years. However, low-resolution images still remain challenging, with CNNs performing relatively poorly compared to humans. One possibility to improve performance in these settings often advocated in the literature is the use of super-resolution (SR). In this paper, we explore the usefulness of SR algorithms for cross-resolution face recognition in experiments on the Labeled Faces in the Wild (LFW) and SCface datasets using four recent deep CNN models. We conduct experiments with synthetically down-sampled images as well as real-life low-resolution imagery captured by surveillance cameras. Our experiments show that image super-resolution can improve face recognition performance considerably on very low-resolution images (of size 24 x 24 or 32 x 32 pixels), when images are artificially down-sampled, but has a lesser (or sometimes even a detrimental) effect with real-life images leaving significant room for further research in this area. |
Rok, Novosel; Blaž, Meden; Žiga, Emeršič; Vitomir, Štruc; Peer, Peter Face recognition with Raspberry Pi for IoT Environments. Inproceedings In: Proceedings of the Twenty-sixth International Electrotechnical and Computer Science Conference ERK 2017, 2017. @inproceedings{ERK2017c,
title = {Face recognition with Raspberry Pi for IoT Environments.},
author = {Novosel Rok and Meden Blaž and Emeršič Žiga and Štruc Vitomir and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/novoselface_recognition.pdf},
year = {2017},
date = {2017-09-01},
booktitle = {Proceedings of the Twenty-sixth International Electrotechnical and Computer Science Conference ERK 2017},
abstract = {IoT has seen steady growth over recent years – smart home appliances, smart personal gear, personal assistants and many more. The same is true for the field of bio-metrics where the need for automatic and secure recognition schemes have spurred the development of fingerprint-and face-recognition mechanisms found today in most smart phones and similar hand-held devices. Devices used in the Internet of Things (IoT) are often low-powered with limited computational resources. This means that biomet-ric recognition pipelines aimed at IoT need to be streamlined and as efficient as possible. Towards this end, we describe in this paper how image-based biometrics can be leveraged in an IoT environment using a Raspberry Pi. We present a proof-of-concept web-based information system, secured by a face-recognition procedure, that gives authorized users access to potentially sensitive information.},
keywords = {face recognition, IoT, PI, proof of concept},
pubstate = {published},
tppubtype = {inproceedings}
}
IoT has seen steady growth over recent years – smart home appliances, smart personal gear, personal assistants and many more. The same is true for the field of bio-metrics where the need for automatic and secure recognition schemes have spurred the development of fingerprint-and face-recognition mechanisms found today in most smart phones and similar hand-held devices. Devices used in the Internet of Things (IoT) are often low-powered with limited computational resources. This means that biomet-ric recognition pipelines aimed at IoT need to be streamlined and as efficient as possible. Towards this end, we describe in this paper how image-based biometrics can be leveraged in an IoT environment using a Raspberry Pi. We present a proof-of-concept web-based information system, secured by a face-recognition procedure, that gives authorized users access to potentially sensitive information. |
Emeršič, Žiga; Štepec, Dejan; Štruc, Vitomir; Peer, Peter; George, Anjith; Ahmad, Adii; Omar, Elshibani; Boult, Terrance E.; Safdaii, Reza; Zhou, Yuxiang; others Stefanos Zafeiriou,; Yaman, Dogucan; Eyoikur, Fevziye I.; Ekenel, Hazim K. The unconstrained ear recognition challenge Inproceedings In: 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 715–724, IEEE 2017. @inproceedings{emervsivc2017unconstrained,
title = {The unconstrained ear recognition challenge},
author = {Žiga Emeršič and Dejan Štepec and Vitomir Štruc and Peter Peer and Anjith George and Adii Ahmad and Elshibani Omar and Terrance E. Boult and Reza Safdaii and Yuxiang Zhou and others Stefanos Zafeiriou and Dogucan Yaman and Fevziye I. Eyoikur and Hazim K. Ekenel},
url = {https://arxiv.org/pdf/1708.06997.pdf},
year = {2017},
date = {2017-09-01},
booktitle = {2017 IEEE International Joint Conference on Biometrics (IJCB)},
pages = {715--724},
organization = {IEEE},
abstract = {In this paper we present the results o f the Unconstrained Ear Recognition Challenge (UERC), a group benchmarking effort centered around the problem o f person recognition from ear images captured in uncontrolled conditions. The goal o f the challenge was to assess the performance of existing ear recognition techniques on a challenging largescale dataset and identify open problems that need to be addressed in the future. Five groups from three continents participated in the challenge and contributed six ear recognition techniques fo r the evaluation, while multiple baselines were made available for the challenge by the UERC organizers. A comprehensive analysis was conducted with all participating approaches addressing essential research questions pertaining to the sensitivity o f the technology to head rotation, flipping, gallery size, large-scale recognition and others. The top performer o f the UERC was found to ensure robust performance on a smaller part o f the dataset (with 180 subjects) regardless o f image characteristics, but still exhibited a significant performance drop when the entire dataset comprising 3,704 subjects was used for testing.
},
keywords = {biometrics, competition, ear recognition, IJCB, uerc, unconstrained ear recognition challenge},
pubstate = {published},
tppubtype = {inproceedings}
}
In this paper we present the results o f the Unconstrained Ear Recognition Challenge (UERC), a group benchmarking effort centered around the problem o f person recognition from ear images captured in uncontrolled conditions. The goal o f the challenge was to assess the performance of existing ear recognition techniques on a challenging largescale dataset and identify open problems that need to be addressed in the future. Five groups from three continents participated in the challenge and contributed six ear recognition techniques fo r the evaluation, while multiple baselines were made available for the challenge by the UERC organizers. A comprehensive analysis was conducted with all participating approaches addressing essential research questions pertaining to the sensitivity o f the technology to head rotation, flipping, gallery size, large-scale recognition and others. The top performer o f the UERC was found to ensure robust performance on a smaller part o f the dataset (with 180 subjects) regardless o f image characteristics, but still exhibited a significant performance drop when the entire dataset comprising 3,704 subjects was used for testing.
|
Emeršič, Žiga; Štepec, Dejan; Štruc, Vitomir; Peer, Peter Training convolutional neural networks with limited training data for ear recognition in the wild Inproceedings In: IEEE International Conference on Automatic Face and Gesture Recognition, Workshop on Biometrics in the Wild 2017, 2017. @inproceedings{emervsivc2017training,
title = {Training convolutional neural networks with limited training data for ear recognition in the wild},
author = {Žiga Emeršič and Dejan Štepec and Vitomir Štruc and Peter Peer},
url = {https://arxiv.org/pdf/1711.09952.pdf},
year = {2017},
date = {2017-05-01},
booktitle = {IEEE International Conference on Automatic Face and Gesture Recognition, Workshop on Biometrics in the Wild 2017},
journal = {arXiv preprint arXiv:1711.09952},
abstract = {Identity recognition from ear images is an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes ear recognition technology an appealing choice for surveillance and security applications as well as related application domains. In contrast to other biometric modalities, where large datasets captured in uncontrolled settings are readily available, datasets of ear images are still limited in size and mostly of laboratory-like quality. As a consequence, ear recognition technology has not benefited yet from advances in deep learning and convolutional neural networks (CNNs) and is still lacking behind other modalities that experienced significant performance gains owing to deep recognition technology. In this paper we address this problem and aim at building a CNNbased ear recognition model. We explore different strategies towards model training with limited amounts of training data and show that by selecting an appropriate model architecture, using aggressive data augmentation and selective learning on existing (pre-trained) models, we are able to learn an effective CNN-based model using a little more than 1300 training images. The result of our work is the first CNN-based approach to ear recognition that is also made publicly available to the research community. With our model we are able to improve on the rank one recognition rate of the previous state-of-the-art by more than 25% on a challenging dataset of ear images captured from the web (a.k.a. in the wild).},
keywords = {CNN, convolutional neural networks, ear, ear recognition, limited data, model learning},
pubstate = {published},
tppubtype = {inproceedings}
}
Identity recognition from ear images is an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes ear recognition technology an appealing choice for surveillance and security applications as well as related application domains. In contrast to other biometric modalities, where large datasets captured in uncontrolled settings are readily available, datasets of ear images are still limited in size and mostly of laboratory-like quality. As a consequence, ear recognition technology has not benefited yet from advances in deep learning and convolutional neural networks (CNNs) and is still lacking behind other modalities that experienced significant performance gains owing to deep recognition technology. In this paper we address this problem and aim at building a CNNbased ear recognition model. We explore different strategies towards model training with limited amounts of training data and show that by selecting an appropriate model architecture, using aggressive data augmentation and selective learning on existing (pre-trained) models, we are able to learn an effective CNN-based model using a little more than 1300 training images. The result of our work is the first CNN-based approach to ear recognition that is also made publicly available to the research community. With our model we are able to improve on the rank one recognition rate of the previous state-of-the-art by more than 25% on a challenging dataset of ear images captured from the web (a.k.a. in the wild). |
Emersic, Ziga; Meden, Blaz; Peer, Peter; Struc, Vitornir Covariate analysis of descriptor-based ear recognition techniques Inproceedings In: 2017 international conference and workshop on bioinspired intelligence (IWOBI), pp. 1–9, IEEE 2017. @inproceedings{emersic2017covariate,
title = {Covariate analysis of descriptor-based ear recognition techniques},
author = {Ziga Emersic and Blaz Meden and Peter Peer and Vitornir Struc},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/Covariate_Analysis_of_Descriptor_based_Ear_Recognition_Techniques.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {2017 international conference and workshop on bioinspired intelligence (IWOBI)},
pages = {1--9},
organization = {IEEE},
abstract = {Dense descriptor-based feature extraction techniques represent a popular choice for implementing biometric ear recognition system and are in general considered to be the current state-of-the-art in this area. In this paper, we study the impact of various factors (i.e., head rotation, presence of occlusions, gender and ethnicity) on the performance of 8 state-of-the-art descriptor-based ear recognition techniques. Our goal is to pinpoint weak points of the existing technology and identify open problems worth exploring in the future. We conduct our covariate analysis through identification experiments on the challenging AWE (Annotated Web Ears) dataset and report our findings. The results of our study show that high degrees of head movement and presence of accessories significantly impact the identification performance, whereas mild degrees of the listed factors and other covariates such as gender and ethnicity impact the identification performance only to a limited extent.},
keywords = {AWE, covariate analysis, descriptors, ear, performance evaluation},
pubstate = {published},
tppubtype = {inproceedings}
}
Dense descriptor-based feature extraction techniques represent a popular choice for implementing biometric ear recognition system and are in general considered to be the current state-of-the-art in this area. In this paper, we study the impact of various factors (i.e., head rotation, presence of occlusions, gender and ethnicity) on the performance of 8 state-of-the-art descriptor-based ear recognition techniques. Our goal is to pinpoint weak points of the existing technology and identify open problems worth exploring in the future. We conduct our covariate analysis through identification experiments on the challenging AWE (Annotated Web Ears) dataset and report our findings. The results of our study show that high degrees of head movement and presence of accessories significantly impact the identification performance, whereas mild degrees of the listed factors and other covariates such as gender and ethnicity impact the identification performance only to a limited extent. |
Emeršič, Žiga; Štruc, Vitomir; Peer, Peter Ear recognition: More than a survey Journal Article In: Neurocomputing, vol. 255, pp. 26–39, 2017. @article{emervsivc2017ear,
title = {Ear recognition: More than a survey},
author = {Žiga Emeršič and Vitomir Štruc and Peter Peer},
url = {https://arxiv.org/pdf/1611.06203.pdf},
year = {2017},
date = {2017-01-01},
journal = {Neurocomputing},
volume = {255},
pages = {26--39},
publisher = {Elsevier},
abstract = {Automatic identity recognition from ear images represents an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes the technology an appealing choice for surveillance and security applications as well as other application domains. Significant contributions have been made in the field over recent years, but open research problems still remain and hinder a wider (commercial) deployment of the technology. This paper presents an overview of the field of automatic ear recognition (from 2D images) and focuses specifically on the most recent, descriptor-based methods proposed in this area. Open challenges are discussed and potential research directions are outlined with the goal of providing the reader with a point of reference for issues worth examining in the future. In addition to a comprehensive review on ear recognition technology, the paper also introduces a new, fully unconstrained dataset of ear images gathered from the web and a toolbox implementing several state-of-the-art techniques for ear recognition. The dataset and toolbox are meant to address some of the open issues in the field and are made publicly available to the research community.},
keywords = {AWE, biometrics, dataset, ear, ear recognition, performance evalution, survey, toolbox},
pubstate = {published},
tppubtype = {article}
}
Automatic identity recognition from ear images represents an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes the technology an appealing choice for surveillance and security applications as well as other application domains. Significant contributions have been made in the field over recent years, but open research problems still remain and hinder a wider (commercial) deployment of the technology. This paper presents an overview of the field of automatic ear recognition (from 2D images) and focuses specifically on the most recent, descriptor-based methods proposed in this area. Open challenges are discussed and potential research directions are outlined with the goal of providing the reader with a point of reference for issues worth examining in the future. In addition to a comprehensive review on ear recognition technology, the paper also introduces a new, fully unconstrained dataset of ear images gathered from the web and a toolbox implementing several state-of-the-art techniques for ear recognition. The dataset and toolbox are meant to address some of the open issues in the field and are made publicly available to the research community. |
Meden, Blaž; Malli, Refik Can; Fabijan, Sebastjan; Ekenel, Hazim Kemal; Štruc, Vitomir; Peer, Peter Face deidentification with generative deep neural networks Journal Article In: IET Signal Processing, vol. 11, no. 9, pp. 1046–1054, 2017. @article{meden2017face,
title = {Face deidentification with generative deep neural networks},
author = {Blaž Meden and Refik Can Malli and Sebastjan Fabijan and Hazim Kemal Ekenel and Vitomir Štruc and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/Face_Deidentification_with_Generative_Deep_Neural_Networks.pdf},
year = {2017},
date = {2017-01-01},
journal = {IET Signal Processing},
volume = {11},
number = {9},
pages = {1046--1054},
publisher = {IET},
abstract = {Face deidentification is an active topic amongst privacy and security researchers. Early deidentification methods relying on image blurring or pixelisation have been replaced in recent years with techniques based on formal anonymity models that provide privacy guaranties and retain certain characteristics of the data even after deidentification. The latter aspect is important, as it allows the deidentified data to be used in applications for which identity information is irrelevant. In this work, the authors present a novel face deidentification pipeline, which ensures anonymity by synthesising artificial surrogate faces using generative neural networks (GNNs). The generated faces are used to deidentify subjects in images or videos, while preserving non-identity-related aspects of the data and consequently enabling data utilisation. Since generative networks are highly adaptive and can utilise diverse parameters (pertaining to the appearance of the generated output in terms of facial expressions, gender, race etc.), they represent a natural choice for the problem of face deidentification. To demonstrate the feasibility of the authors’ approach, they perform experiments using automated recognition tools and human annotators. Their results show that the recognition performance on deidentified images is close to chance, suggesting that the deidentification process based on GNNs is effective.},
keywords = {biometrics, computer vision, deidentification, face, privacy protection},
pubstate = {published},
tppubtype = {article}
}
Face deidentification is an active topic amongst privacy and security researchers. Early deidentification methods relying on image blurring or pixelisation have been replaced in recent years with techniques based on formal anonymity models that provide privacy guaranties and retain certain characteristics of the data even after deidentification. The latter aspect is important, as it allows the deidentified data to be used in applications for which identity information is irrelevant. In this work, the authors present a novel face deidentification pipeline, which ensures anonymity by synthesising artificial surrogate faces using generative neural networks (GNNs). The generated faces are used to deidentify subjects in images or videos, while preserving non-identity-related aspects of the data and consequently enabling data utilisation. Since generative networks are highly adaptive and can utilise diverse parameters (pertaining to the appearance of the generated output in terms of facial expressions, gender, race etc.), they represent a natural choice for the problem of face deidentification. To demonstrate the feasibility of the authors’ approach, they perform experiments using automated recognition tools and human annotators. Their results show that the recognition performance on deidentified images is close to chance, suggesting that the deidentification process based on GNNs is effective. |
Meden, Blaz; Emersic, Ziga; Struc, Vitomir; Peer, Peter k-Same-Net: Neural-Network-Based Face Deidentification Inproceedings In: 2017 International Conference and Workshop on Bioinspired Intelligence (IWOBI), pp. 1–7, IEEE 2017. @inproceedings{meden2017kappa,
title = {k-Same-Net: Neural-Network-Based Face Deidentification},
author = {Blaz Meden and Ziga Emersic and Vitomir Struc and Peter Peer},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/k-same-net.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {2017 International Conference and Workshop on Bioinspired Intelligence (IWOBI)},
pages = {1--7},
organization = {IEEE},
abstract = {An increasing amount of video and image data is being shared between government entities and other relevant stakeholders and requires careful handling of personal information. A popular approach for privacy protection in such data is the use of deidentification techniques, which aim at concealing the identity of individuals in the imagery while still preserving certain aspects of the data deidentification. In this work, we propose a novel approach towards face deidentification, called k-Same-Net, which combines recent generative neural networks (GNNs) with the well-known k-anonymity mechanism and provides formal guarantees regarding privacy protection on a closed set of identities. Our GNN is able to generate synthetic surrogate face images for dedentification by seamlessly combining features of identities used to train the GNN mode. furthermore, it allows us to guide the image-generation process with a small set of appearance-related parameters that can be used to alter specific aspects (e.g., facial expressions, age, gender) of the synthesized surrogate images. We demonstrate the feasibility of k-Same-Net in comparative experiments with competing techniques on the XM2VTS dataset and discuss the main characteristics of our approach.},
keywords = {deidentification, face, privacy protection},
pubstate = {published},
tppubtype = {inproceedings}
}
An increasing amount of video and image data is being shared between government entities and other relevant stakeholders and requires careful handling of personal information. A popular approach for privacy protection in such data is the use of deidentification techniques, which aim at concealing the identity of individuals in the imagery while still preserving certain aspects of the data deidentification. In this work, we propose a novel approach towards face deidentification, called k-Same-Net, which combines recent generative neural networks (GNNs) with the well-known k-anonymity mechanism and provides formal guarantees regarding privacy protection on a closed set of identities. Our GNN is able to generate synthetic surrogate face images for dedentification by seamlessly combining features of identities used to train the GNN mode. furthermore, it allows us to guide the image-generation process with a small set of appearance-related parameters that can be used to alter specific aspects (e.g., facial expressions, age, gender) of the synthesized surrogate images. We demonstrate the feasibility of k-Same-Net in comparative experiments with competing techniques on the XM2VTS dataset and discuss the main characteristics of our approach. |
Das, Abhijit; Pal, Umapada; Ferrer, Miguel A; Blumenstein, Michael; Štepec, Dejan; Rot, Peter; Emeršič, Ziga; Peer, Peter; Štruc, Vitomir; Kumar, SV Aruna; S, Harish B SSERBC 2017: Sclera segmentation and eye recognition benchmarking competition Inproceedings In: 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 742–747, IEEE 2017. @inproceedings{das2017sserbc,
title = {SSERBC 2017: Sclera segmentation and eye recognition benchmarking competition},
author = {Abhijit Das and Umapada Pal and Miguel A Ferrer and Michael Blumenstein and Dejan Štepec and Peter Rot and Ziga Emeršič and Peter Peer and Vitomir Štruc and SV Aruna Kumar and Harish B S},
url = {https://lmi.fe.uni-lj.si/wp-content/uploads/2019/08/SSERBC2017.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {2017 IEEE International Joint Conference on Biometrics (IJCB)},
pages = {742--747},
organization = {IEEE},
abstract = {This paper summarises the results of the Sclera Segmentation and Eye Recognition Benchmarking Competition (SSERBC 2017). It was organised in the context of the International Joint Conference on Biometrics (IJCB 2017). The aim of this competition was to record the recent developments in sclera segmentation and eye recognition in the visible spectrum (using iris, sclera and peri-ocular, and their fusion), and also to gain the attention of researchers on this subject.
In this regard, we have used the Multi-Angle Sclera Dataset (MASD version 1). It is comprised of 2624 images taken from both the eyes of 82 identities. Therefore, it consists of images of 164 (82*2) eyes. A manual segmentation mask of these images was created to baseline both tasks.
Precision and recall based statistical measures were employed to evaluate the effectiveness of the segmentation and the ranks of the segmentation task. Recognition accuracy measure has been employed to measure the recognition task. Manually segmented sclera, iris and periocular regions were used in the recognition task. Sixteen teams registered for the competition, and among them, six teams submitted their algorithms or systems for the segmentation task and two of them submitted their recognition algorithm or systems.
The results produced by these algorithms or systems reflect current developments in the literature of sclera segmentation and eye recognition, employing cutting edge techniques. The MASD version 1 dataset with some of the ground truth will be freely available for research purposes. The success of the competition also demonstrates the recent interests of researchers from academia as well as industry on this subject},
keywords = {competition, sclera, sclera segmentation},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper summarises the results of the Sclera Segmentation and Eye Recognition Benchmarking Competition (SSERBC 2017). It was organised in the context of the International Joint Conference on Biometrics (IJCB 2017). The aim of this competition was to record the recent developments in sclera segmentation and eye recognition in the visible spectrum (using iris, sclera and peri-ocular, and their fusion), and also to gain the attention of researchers on this subject.
In this regard, we have used the Multi-Angle Sclera Dataset (MASD version 1). It is comprised of 2624 images taken from both the eyes of 82 identities. Therefore, it consists of images of 164 (82*2) eyes. A manual segmentation mask of these images was created to baseline both tasks.
Precision and recall based statistical measures were employed to evaluate the effectiveness of the segmentation and the ranks of the segmentation task. Recognition accuracy measure has been employed to measure the recognition task. Manually segmented sclera, iris and periocular regions were used in the recognition task. Sixteen teams registered for the competition, and among them, six teams submitted their algorithms or systems for the segmentation task and two of them submitted their recognition algorithm or systems.
The results produced by these algorithms or systems reflect current developments in the literature of sclera segmentation and eye recognition, employing cutting edge techniques. The MASD version 1 dataset with some of the ground truth will be freely available for research purposes. The success of the competition also demonstrates the recent interests of researchers from academia as well as industry on this subject |
Grm, Klemen; Štruc, Vitomir; Artiges, Anais; Caron, Matthieu; Ekenel, Hazim K. Strengths and weaknesses of deep learning models for face recognition against image degradations Journal Article In: IET Biometrics, vol. 7, no. 1, pp. 81–89, 2017. @article{grm2017strengths,
title = {Strengths and weaknesses of deep learning models for face recognition against image degradations},
author = {Klemen Grm and Vitomir Štruc and Anais Artiges and Matthieu Caron and Hazim K. Ekenel},
url = {https://arxiv.org/pdf/1710.01494.pdf},
year = {2017},
date = {2017-01-01},
journal = {IET Biometrics},
volume = {7},
number = {1},
pages = {81--89},
publisher = {IET},
abstract = {Convolutional neural network (CNN) based approaches are the state of the art in various computer vision tasks including face recognition. Considerable research effort is currently being directed toward further improving CNNs by focusing on model architectures and training techniques. However, studies systematically exploring the strengths and weaknesses of existing deep models for face recognition are still relatively scarce. In this paper, we try to fill this gap and study the effects of different covariates on the verification performance of four recent CNN models using the Labelled Faces in the Wild dataset. Specifically, we investigate the influence of covariates related to image quality and model characteristics, and analyse their impact on the face verification performance of different deep CNN models. Based on comprehensive and rigorous experimentation, we identify the strengths and weaknesses of the deep learning models, and present key areas for potential future research. Our results indicate that high levels of noise, blur, missing pixels, and brightness have a detrimental effect on the verification performance of all models, whereas the impact of contrast changes and compression artefacts is limited. We find that the descriptor-computation strategy and colour information does not have a significant influence on performance.},
keywords = {CNN, convolutional neural networks, face recognition, googlenet, study, vgg},
pubstate = {published},
tppubtype = {article}
}
Convolutional neural network (CNN) based approaches are the state of the art in various computer vision tasks including face recognition. Considerable research effort is currently being directed toward further improving CNNs by focusing on model architectures and training techniques. However, studies systematically exploring the strengths and weaknesses of existing deep models for face recognition are still relatively scarce. In this paper, we try to fill this gap and study the effects of different covariates on the verification performance of four recent CNN models using the Labelled Faces in the Wild dataset. Specifically, we investigate the influence of covariates related to image quality and model characteristics, and analyse their impact on the face verification performance of different deep CNN models. Based on comprehensive and rigorous experimentation, we identify the strengths and weaknesses of the deep learning models, and present key areas for potential future research. Our results indicate that high levels of noise, blur, missing pixels, and brightness have a detrimental effect on the verification performance of all models, whereas the impact of contrast changes and compression artefacts is limited. We find that the descriptor-computation strategy and colour information does not have a significant influence on performance. |
Šket, Robert; Treichel, Nicole; Kublik, Susanne; Debevec, Tadej; Eiken, Ola; Mekjavić, Igor; Schloter, Michael; Vital, Marius; Chandler, Jenna; Tiedje, James M; Murovec, Boštjan; Prevoršek, Zala; Likar, Matevž; Stres, Blaž Hypoxia and inactivity related physiological changes precede or take place in absence of significant rearrangements in bacterial community structure: The PlanHab randomized trial pilot study Journal Article In: PLOS ONE, vol. 12, no. 12, pp. 1-26, 2017. @article{10.1371/journal.pone.0188556,
title = {Hypoxia and inactivity related physiological changes precede or take place in absence of significant rearrangements in bacterial community structure: The PlanHab randomized trial pilot study},
author = {Robert Šket and Nicole Treichel and Susanne Kublik and Tadej Debevec and Ola Eiken and Igor Mekjavić and Michael Schloter and Marius Vital and Jenna Chandler and James M Tiedje and Boštjan Murovec and Zala Prevoršek and Matevž Likar and Blaž Stres},
url = {https://doi.org/10.1371/journal.pone.0188556},
doi = {10.1371/journal.pone.0188556},
year = {2017},
date = {2017-01-01},
journal = {PLOS ONE},
volume = {12},
number = {12},
pages = {1-26},
publisher = {Public Library of Science},
abstract = {We explored the assembly of intestinal microbiota in healthy male participants during the randomized crossover design of run-in (5 day) and experimental phases (21-day normoxic bed rest (NBR), hypoxic bed rest (HBR) and hypoxic ambulation (HAmb) in a strictly controlled laboratory environment, with balanced fluid and dietary intakes, controlled circadian rhythm, microbial ambiental burden and 24/7 medical surveillance. The fraction of inspired O2 (FiO2) and partial pressure of inspired O2 (PiO2) were 0.209 and 133.1 ± 0.3 mmHg for NBR and 0.141 ± 0.004 and 90.0 ± 0.4 mmHg for both hypoxic variants (HBR and HAmb; ~4000 m simulated altitude), respectively. A number of parameters linked to intestinal environment such as defecation frequency, intestinal electrical conductivity (IEC), sterol and polyphenol content and diversity, indole, aromaticity and spectral characteristics of dissolved organic matter (DOM) were measured (64 variables). The structure and diversity of bacterial microbial community was assessed using 16S rRNA amplicon sequencing. Inactivity negatively affected frequency of defecation and in combination with hypoxia increased IEC (p < 0.05). In contrast, sterol and polyphenol diversity and content, various characteristics of DOM and aromatic compounds, the structure and diversity of bacterial microbial community were not significantly affected over time. A new in-house PlanHab database was established to integrate all measured variables on host physiology, diet, experiment, immune and metabolic markers (n = 231). The observed progressive decrease in defecation frequency and concomitant increase in IEC suggested that the transition from healthy physiological state towards the developed symptoms of low magnitude obesity-related syndromes was dose dependent on the extent of time spent in inactivity and preceded or took place in absence of significant rearrangements in bacterial microbial community. Species B. thetaiotamicron, B. fragilis, B. dorei and other Bacteroides with reported relevance for dysbiotic medical conditions were significantly enriched in HBR, characterized with most severe inflammation symptoms, indicating a shift towards host mucin degradation and proinflammatory immune crosstalk.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We explored the assembly of intestinal microbiota in healthy male participants during the randomized crossover design of run-in (5 day) and experimental phases (21-day normoxic bed rest (NBR), hypoxic bed rest (HBR) and hypoxic ambulation (HAmb) in a strictly controlled laboratory environment, with balanced fluid and dietary intakes, controlled circadian rhythm, microbial ambiental burden and 24/7 medical surveillance. The fraction of inspired O2 (FiO2) and partial pressure of inspired O2 (PiO2) were 0.209 and 133.1 ± 0.3 mmHg for NBR and 0.141 ± 0.004 and 90.0 ± 0.4 mmHg for both hypoxic variants (HBR and HAmb; ~4000 m simulated altitude), respectively. A number of parameters linked to intestinal environment such as defecation frequency, intestinal electrical conductivity (IEC), sterol and polyphenol content and diversity, indole, aromaticity and spectral characteristics of dissolved organic matter (DOM) were measured (64 variables). The structure and diversity of bacterial microbial community was assessed using 16S rRNA amplicon sequencing. Inactivity negatively affected frequency of defecation and in combination with hypoxia increased IEC (p < 0.05). In contrast, sterol and polyphenol diversity and content, various characteristics of DOM and aromatic compounds, the structure and diversity of bacterial microbial community were not significantly affected over time. A new in-house PlanHab database was established to integrate all measured variables on host physiology, diet, experiment, immune and metabolic markers (n = 231). The observed progressive decrease in defecation frequency and concomitant increase in IEC suggested that the transition from healthy physiological state towards the developed symptoms of low magnitude obesity-related syndromes was dose dependent on the extent of time spent in inactivity and preceded or took place in absence of significant rearrangements in bacterial microbial community. Species B. thetaiotamicron, B. fragilis, B. dorei and other Bacteroides with reported relevance for dysbiotic medical conditions were significantly enriched in HBR, characterized with most severe inflammation symptoms, indicating a shift towards host mucin degradation and proinflammatory immune crosstalk. |
Šket, Robert; Treichel, Nicole; Debevec, Tadej; Eiken, Ola; Mekjavic, Igor; Schloter, Michael; Vital, Marius; Chandler, Jenna; Tiedje, James M; Murovec, Boštjan; Prevoršek, Zala; Stres, Blaž Hypoxia and Inactivity Related Physiological Changes (Constipation, Inflammation) Are Not Reflected at the Level of Gut Metabolites and Butyrate Producing Microbial Community: The PlanHab Study Journal Article In: Frontiers in Physiology, vol. 8, pp. 250, 2017, ISSN: 1664-042X. @article{10.3389/fphys.2017.00250,
title = {Hypoxia and Inactivity Related Physiological Changes (Constipation, Inflammation) Are Not Reflected at the Level of Gut Metabolites and Butyrate Producing Microbial Community: The PlanHab Study},
author = {Robert Šket and Nicole Treichel and Tadej Debevec and Ola Eiken and Igor Mekjavic and Michael Schloter and Marius Vital and Jenna Chandler and James M Tiedje and Boštjan Murovec and Zala Prevoršek and Blaž Stres},
url = {https://www.frontiersin.org/article/10.3389/fphys.2017.00250},
doi = {10.3389/fphys.2017.00250},
issn = {1664-042X},
year = {2017},
date = {2017-01-01},
journal = {Frontiers in Physiology},
volume = {8},
pages = {250},
abstract = {We explored the assembly of intestinal microbiota in healthy male participants during the run-in (5 day) and experimental phases (21-day normoxic bed rest (NBR), hypoxic bedrest (HBR) and hypoxic ambulation (HAmb) in a strictly controlled laboratory environment, balanced fluid and dietary intakes, controlled circadian rhythm, microbial ambiental burden and 24/7 medical surveillance. The fraction of inspired O2 (FiO2) and partial pressure of inspired O2 (PiO2) were 0.209 and 133.1 ± 0.3 mmHg for NBR and 0.141 ± 0.004 and 90.0 ± 0.4 mmHg for both hypoxic variants (HBR and HAmb; ~4000 m simulated altitude), respectively. A number of parameters linked to intestinal transit spanning Bristol Stool Scale, defecation rates, zonulin, α1-antitrypsin, eosinophil derived neurotoxin, bile acids, reducing sugars, short chain fatty acids, total soluble organic carbon, water content, diet composition and food intake were measured (167 variables). The abundance, structure and diversity of butyrate producing microbial community were assessed using the two primary bacterial butyrate synthesis pathways, butyryl-CoA: acetate CoA-transferase (but) and butyrate kinase (buk) genes. Inactivity negatively affected fecal consistency and in combination with hypoxia aggravated the state of gut inflammation (p < 0.05). In contrast, gut permeability, various metabolic markers, the structure, diversity and abundance of butyrate producing microbial community were not significantly affected. Rearrangements in the butyrate producing microbial community structure were explained by experimental setup (13.4 %), experimentally structured metabolites (12.8 %) and gut metabolite-immunological markers (11.9 %), with 61.9% remaining unexplained. Many of the measured parameters were found to be correlated and were hence omitted from further analyses. The observed progressive increase in two immunological intestinal markers suggested that the transition from healthy physiological state towards the developed symptoms of low magnitude obesity-related syndromes was primarily driven by the onset of inactivity (lack of exercise in NBR) that were exacerbated by systemic hypoxia (HBR) and significantly alleviated by exercise, despite hypoxia (HAmb). Butyrate producing community in colon exhibited apparent resilience towards short-term modifications in host exercise or hypoxia. Progressive constipation (decreased intestinal motility) and increased local inflammation marker suggest that changes in microbial colonization and metabolism were taking place at the location of small intestine.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We explored the assembly of intestinal microbiota in healthy male participants during the run-in (5 day) and experimental phases (21-day normoxic bed rest (NBR), hypoxic bedrest (HBR) and hypoxic ambulation (HAmb) in a strictly controlled laboratory environment, balanced fluid and dietary intakes, controlled circadian rhythm, microbial ambiental burden and 24/7 medical surveillance. The fraction of inspired O2 (FiO2) and partial pressure of inspired O2 (PiO2) were 0.209 and 133.1 ± 0.3 mmHg for NBR and 0.141 ± 0.004 and 90.0 ± 0.4 mmHg for both hypoxic variants (HBR and HAmb; ~4000 m simulated altitude), respectively. A number of parameters linked to intestinal transit spanning Bristol Stool Scale, defecation rates, zonulin, α1-antitrypsin, eosinophil derived neurotoxin, bile acids, reducing sugars, short chain fatty acids, total soluble organic carbon, water content, diet composition and food intake were measured (167 variables). The abundance, structure and diversity of butyrate producing microbial community were assessed using the two primary bacterial butyrate synthesis pathways, butyryl-CoA: acetate CoA-transferase (but) and butyrate kinase (buk) genes. Inactivity negatively affected fecal consistency and in combination with hypoxia aggravated the state of gut inflammation (p < 0.05). In contrast, gut permeability, various metabolic markers, the structure, diversity and abundance of butyrate producing microbial community were not significantly affected. Rearrangements in the butyrate producing microbial community structure were explained by experimental setup (13.4 %), experimentally structured metabolites (12.8 %) and gut metabolite-immunological markers (11.9 %), with 61.9% remaining unexplained. Many of the measured parameters were found to be correlated and were hence omitted from further analyses. The observed progressive increase in two immunological intestinal markers suggested that the transition from healthy physiological state towards the developed symptoms of low magnitude obesity-related syndromes was primarily driven by the onset of inactivity (lack of exercise in NBR) that were exacerbated by systemic hypoxia (HBR) and significantly alleviated by exercise, despite hypoxia (HAmb). Butyrate producing community in colon exhibited apparent resilience towards short-term modifications in host exercise or hypoxia. Progressive constipation (decreased intestinal motility) and increased local inflammation marker suggest that changes in microbial colonization and metabolism were taking place at the location of small intestine. |
2016
|
Kravanja, Jaka; Žganec, Mario; Žganec-Gros, Jerneja; Dobrišek, Simon; Štruc, Vitomir Robust Depth Image Acquisition Using Modulated Pattern Projection and Probabilistic Graphical Models Journal Article In: Sensors, vol. 16, no. 10, pp. 1740, 2016. @article{kravanja2016robust,
title = {Robust Depth Image Acquisition Using Modulated Pattern Projection and Probabilistic Graphical Models},
author = {Jaka Kravanja and Mario Žganec and Jerneja Žganec-Gros and Simon Dobrišek and Vitomir Štruc},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/11/sensors-16-01740-1.pdf},
doi = {10.3390/s16101740},
year = {2016},
date = {2016-10-20},
journal = {Sensors},
volume = {16},
number = {10},
pages = {1740},
publisher = {Multidisciplinary Digital Publishing Institute},
abstract = {Depth image acquisition with structured light approaches in outdoor environments is a challenging problem due to external factors, such as ambient sunlight, which commonly affect the acquisition procedure. This paper presents a novel structured light sensor designed specifically for operation in outdoor environments. The sensor exploits a modulated sequence of structured light projected onto the target scene to counteract environmental factors and estimate a spatial distortion map in a robust manner. The correspondence between the projected pattern and the estimated distortion map is then established using a probabilistic framework based on graphical models. Finally, the depth image of the target scene is reconstructed using a number of reference frames recorded during the calibration process. We evaluate the proposed sensor on experimental data in indoor and outdoor environments and present comparative experiments with other existing methods, as well as commercial sensors.},
keywords = {3d imaging, 3d sensor, depth imaging, depth sensor, graphical models, modulated pattern projection, outdoor deployment, robust operation, Sensors, structured light},
pubstate = {published},
tppubtype = {article}
}
Depth image acquisition with structured light approaches in outdoor environments is a challenging problem due to external factors, such as ambient sunlight, which commonly affect the acquisition procedure. This paper presents a novel structured light sensor designed specifically for operation in outdoor environments. The sensor exploits a modulated sequence of structured light projected onto the target scene to counteract environmental factors and estimate a spatial distortion map in a robust manner. The correspondence between the projected pattern and the estimated distortion map is then established using a probabilistic framework based on graphical models. Finally, the depth image of the target scene is reconstructed using a number of reference frames recorded during the calibration process. We evaluate the proposed sensor on experimental data in indoor and outdoor environments and present comparative experiments with other existing methods, as well as commercial sensors. |
Scheirer, Walter; Flynn, Patrick; Ding, Changxing; Guo, Guodong; Štruc, Vitomir; Jazaery, Mohamad Al; Dobrišek, Simon; Grm, Klemen; Tao, Dacheng; Zhu, Yu; Brogan, Joel; Banerjee, Sandipan; Bharati, Aparna; Webster, Brandon Richard Report on the BTAS 2016 Video Person Recognition Evaluation Inproceedings In: Proceedings of the IEEE International Conference on Biometrics: Theory, Applications ans Systems (BTAS), IEEE, 2016. @inproceedings{BTAS2016,
title = {Report on the BTAS 2016 Video Person Recognition Evaluation},
author = {Walter Scheirer and Patrick Flynn and Changxing Ding and Guodong Guo and Vitomir Štruc and Mohamad Al Jazaery and Simon Dobrišek and Klemen Grm and Dacheng Tao and Yu Zhu and Joel Brogan and Sandipan Banerjee and Aparna Bharati and Brandon Richard Webster},
year = {2016},
date = {2016-10-05},
booktitle = {Proceedings of the IEEE International Conference on Biometrics: Theory, Applications ans Systems (BTAS)},
publisher = {IEEE},
abstract = {This report presents results from the Video Person Recognition Evaluation held in conjunction with the 8th IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS). Two experiments required algorithms to recognize people in videos from the Pointand- Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1,401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. An additional experiment required algorithms to recognize people in videos from the Video Database of Moving Faces and People (VDMFP). There were 958 videos in this experiment of 297 subjects. Four groups from around the world participated in the evaluation. The top verification rate for PaSC from this evaluation is 0:98 at a false accept rate of 0:01 — a remarkable advancement in performance from the competition held at FG 2015.},
keywords = {biometrics, competition, face recognition, group evaluation, PaSC, performance evaluation},
pubstate = {published},
tppubtype = {inproceedings}
}
This report presents results from the Video Person Recognition Evaluation held in conjunction with the 8th IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS). Two experiments required algorithms to recognize people in videos from the Pointand- Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1,401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. An additional experiment required algorithms to recognize people in videos from the Video Database of Moving Faces and People (VDMFP). There were 958 videos in this experiment of 297 subjects. Four groups from around the world participated in the evaluation. The top verification rate for PaSC from this evaluation is 0:98 at a false accept rate of 0:01 — a remarkable advancement in performance from the competition held at FG 2015. |
Križaj, Janez; Dobrišek, Simon; Mihelič, France; Štruc, Vitomir Facial Landmark Localization from 3D Images Inproceedings In: Proceedings of the Electrotechnical and Computer Science Conference (ERK), Portorož, Slovenia, 2016. @inproceedings{ERK2016Janez,
title = {Facial Landmark Localization from 3D Images},
author = {Janez Križaj and Simon Dobrišek and France Mihelič and Vitomir Štruc},
year = {2016},
date = {2016-09-20},
booktitle = {Proceedings of the Electrotechnical and Computer Science Conference (ERK)},
address = {Portorož, Slovenia},
abstract = {A novel method for automatic facial landmark localization is presented. The method builds on the supervised descent framework, which was shown to successfully localize landmarks in the presence of large expression variations and mild occlusions, but struggles when localizing landmarks on faces with large pose variations. We propose an extension of the supervised descent framework which trains multiple descent maps and results in increased robustness to pose variations. The performance of the proposed method is demonstrated on the Bosphorus database for the problem of facial landmark localization from 3D data. Our experimental results show that the proposed method exhibits increased robustness to pose variations, while retaining high performance in the case of expression and occlusion variations.},
keywords = {3D face data, 3d landmarking, Bosphorus, face alignment, face image processing, facial landmarking, SDM, supervised descent framework},
pubstate = {published},
tppubtype = {inproceedings}
}
A novel method for automatic facial landmark localization is presented. The method builds on the supervised descent framework, which was shown to successfully localize landmarks in the presence of large expression variations and mild occlusions, but struggles when localizing landmarks on faces with large pose variations. We propose an extension of the supervised descent framework which trains multiple descent maps and results in increased robustness to pose variations. The performance of the proposed method is demonstrated on the Bosphorus database for the problem of facial landmark localization from 3D data. Our experimental results show that the proposed method exhibits increased robustness to pose variations, while retaining high performance in the case of expression and occlusion variations. |
Fabijan, Sebastjan; Štruc, Vitomir Vpliv registracije obraznih področij na učinkovitost samodejnega razpoznavanja obrazov: študija z OpenBR Inproceedings In: Proceedings of the Electrotechnical and Computer Science Conference (ERK), 2016. @inproceedings{ERK2016_Seba,
title = {Vpliv registracije obraznih področij na učinkovitost samodejnega razpoznavanja obrazov: študija z OpenBR},
author = {Sebastjan Fabijan and Vitomir Štruc},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/erk_2016_08_22.pdf},
year = {2016},
date = {2016-09-20},
booktitle = {Proceedings of the Electrotechnical and Computer Science Conference (ERK)},
abstract = {Razpoznavanje obrazov je v zadnjih letih postalo eno najuspešnejših področij samodejne, računalniško podprte analize slik, ki se lahko pohvali z različnimi primeri upor-abe v praksi. Enega ključnih korakav za uspešno razpoznavanje predstavlja poravnava obrazov na slikah. S poravnavo poskušamo zagotoviti neodvisnost razpozn-av-an-ja od sprememb zornih kotov pri zajemu slike, ki v slikovne podatke vnašajo visoko stopnjo variabilnosti. V prispevku predstavimo tri postopke poravnavanja obrazov (iz literature) in proučimo njihov vpliv na uspešnost razpoznavanja s postopki, udejanjenimi v odprtokodnem programskem ogrodju Open Source Biometric Recognition (OpenBR). Vse poizkuse izvedemo na podatkovni zbirki Labeled Faces in the Wild (LFW).},
keywords = {4SF, biometrics, face alignment, face recognition, LFW, OpenBR, performance evaluation},
pubstate = {published},
tppubtype = {inproceedings}
}
Razpoznavanje obrazov je v zadnjih letih postalo eno najuspešnejših področij samodejne, računalniško podprte analize slik, ki se lahko pohvali z različnimi primeri upor-abe v praksi. Enega ključnih korakav za uspešno razpoznavanje predstavlja poravnava obrazov na slikah. S poravnavo poskušamo zagotoviti neodvisnost razpozn-av-an-ja od sprememb zornih kotov pri zajemu slike, ki v slikovne podatke vnašajo visoko stopnjo variabilnosti. V prispevku predstavimo tri postopke poravnavanja obrazov (iz literature) in proučimo njihov vpliv na uspešnost razpoznavanja s postopki, udejanjenimi v odprtokodnem programskem ogrodju Open Source Biometric Recognition (OpenBR). Vse poizkuse izvedemo na podatkovni zbirki Labeled Faces in the Wild (LFW). |
Stržinar, Žiga; Grm, Klemen; Štruc, Vitomir Učenje podobnosti v globokih nevronskih omrežjih za razpoznavanje obrazov Inproceedings In: Proceedings of the Electrotechnical and Computer Science Conference (ERK), Portorož, Slovenia, 2016. @inproceedings{ERK2016_sebastjan,
title = {Učenje podobnosti v globokih nevronskih omrežjih za razpoznavanje obrazov},
author = {Žiga Stržinar and Klemen Grm and Vitomir Štruc},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/erk_ziga_Vziga.pdf},
year = {2016},
date = {2016-09-20},
booktitle = {Proceedings of the Electrotechnical and Computer Science Conference (ERK)},
address = {Portorož, Slovenia},
abstract = {Učenje podobnosti med pari vhodnih slik predstavlja enega najpopularnejših pristopov k razpoznavanju na področju globokega učenja. Pri tem pristopu globoko nevronsko omrežje na vhodu sprejme par slik (obrazov) in na izhodu vrne mero podobnosti med vhodnima slikama, ki jo je moč uporabiti za razpoznavanje. Izračun podobnosti je pri tem lahko v celoti udejanjen z globokim omrežjem, lahko pa se omrežje uporabi zgolj za izračun predstavitve vhodnega para slik, preslikava iz izračunane predstavitve v mero podobnosti pa se izvede z drugim, potencialno primernejšim modelom. V tem prispevku preizkusimo 5 različnih modelov za izvedbo preslikave med izračunano predstavitvijo in mero podobnosti, pri čemer za poizkuse uporabimo lastno nevronsko omrežje. Rezultati naših eksperimentov na problemu razpoznavanja obrazov kažejo na pomembnost izbire primernega modela, saj so razlike med uspešnostjo razpoznavanje od modela do modela precejšnje.},
keywords = {biometrics, CNN, deep learning, difference space, face verification, LFW, performance evaluation},
pubstate = {published},
tppubtype = {inproceedings}
}
Učenje podobnosti med pari vhodnih slik predstavlja enega najpopularnejših pristopov k razpoznavanju na področju globokega učenja. Pri tem pristopu globoko nevronsko omrežje na vhodu sprejme par slik (obrazov) in na izhodu vrne mero podobnosti med vhodnima slikama, ki jo je moč uporabiti za razpoznavanje. Izračun podobnosti je pri tem lahko v celoti udejanjen z globokim omrežjem, lahko pa se omrežje uporabi zgolj za izračun predstavitve vhodnega para slik, preslikava iz izračunane predstavitve v mero podobnosti pa se izvede z drugim, potencialno primernejšim modelom. V tem prispevku preizkusimo 5 različnih modelov za izvedbo preslikave med izračunano predstavitvijo in mero podobnosti, pri čemer za poizkuse uporabimo lastno nevronsko omrežje. Rezultati naših eksperimentov na problemu razpoznavanja obrazov kažejo na pomembnost izbire primernega modela, saj so razlike med uspešnostjo razpoznavanje od modela do modela precejšnje. |
Dobrišek, Simon; Čefarin, David; Štruc, Vitomir; Mihelič, France Assessment of the Google Speech Application Programming Interface for Automatic Slovenian Speech Recognition Inproceedings In: Jezikovne Tehnologije in Digitalna Humanistika, 2016. @inproceedings{SJDT,
title = {Assessment of the Google Speech Application Programming Interface for Automatic Slovenian Speech Recognition},
author = {Simon Dobrišek and David Čefarin and Vitomir Štruc and France Mihelič},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/jtdh16-ulfe-luks-sd-final-pdfa.pdf},
year = {2016},
date = {2016-09-20},
booktitle = {Jezikovne Tehnologije in Digitalna Humanistika},
abstract = {Automatic speech recognizers are slowly maturing into technologies that enable humans to communicate more naturally and effectively with a variety of smart devices and information-communication systems. Large global companies such as Google, Microsoft, Apple, IBM and Baidu compete in developing the most reliable speech recognizers, supporting as many of the main world languages as possible. Due to the relatively small number of speakers, the support for the Slovenian spoken language is lagging behind, and among the major global companies only Google has recently supported our spoken language. The paper presents the results of our independent assessment of the Google speech-application programming interface for automatic Slovenian speech recognition. For the experiments, we used speech databases that are otherwise used for the development and assessment of Slovenian speech recognizers.},
keywords = {Google, performance evaluation, speech API, speech technologies},
pubstate = {published},
tppubtype = {inproceedings}
}
Automatic speech recognizers are slowly maturing into technologies that enable humans to communicate more naturally and effectively with a variety of smart devices and information-communication systems. Large global companies such as Google, Microsoft, Apple, IBM and Baidu compete in developing the most reliable speech recognizers, supporting as many of the main world languages as possible. Due to the relatively small number of speakers, the support for the Slovenian spoken language is lagging behind, and among the major global companies only Google has recently supported our spoken language. The paper presents the results of our independent assessment of the Google speech-application programming interface for automatic Slovenian speech recognition. For the experiments, we used speech databases that are otherwise used for the development and assessment of Slovenian speech recognizers. |
Ribič, Metod; Emeršič, Žiga; Štruc, Vitomir; Peer, Peter Influence of alignment on ear recognition : case study on AWE Dataset Inproceedings In: Proceedings of the Electrotechnical and Computer Science Conference (ERK), pp. 131-134, Portorož, Slovenia, 2016. @inproceedings{RibicERK2016,
title = {Influence of alignment on ear recognition : case study on AWE Dataset},
author = {Metod Ribič and Žiga Emeršič and Vitomir Štruc and Peter Peer},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/Influence_of_Alignment_on_Ear_Recognitio.pdf},
year = {2016},
date = {2016-09-20},
booktitle = {Proceedings of the Electrotechnical and Computer Science Conference (ERK)},
pages = {131-134},
address = {Portorož, Slovenia},
abstract = {Ear as a biometric modality presents a viable source for automatic human recognition. In recent years local description methods have been gaining on popularity due to their invariance to illumination and occlusion. However, these methods require that images are well aligned and preprocessed as good as possible. This causes one of the greatest challenges of ear recognition: sensitivity to pose variations. Recently, we presented Annotated Web Ears dataset that opens new challenges in ear recognition. In this paper we test the influence of alignment on recognition performance and prove that even with the alignment the database is still very challenging, even-though the recognition rate is improved due to alignment. We also prove that more sophisticated alignment methods are needed to address the AWE dataset efficiently},
keywords = {AWE, AWE dataset, biometrics, ear alignment, ear recognition, image alignment, Ransac, SIFT},
pubstate = {published},
tppubtype = {inproceedings}
}
Ear as a biometric modality presents a viable source for automatic human recognition. In recent years local description methods have been gaining on popularity due to their invariance to illumination and occlusion. However, these methods require that images are well aligned and preprocessed as good as possible. This causes one of the greatest challenges of ear recognition: sensitivity to pose variations. Recently, we presented Annotated Web Ears dataset that opens new challenges in ear recognition. In this paper we test the influence of alignment on recognition performance and prove that even with the alignment the database is still very challenging, even-though the recognition rate is improved due to alignment. We also prove that more sophisticated alignment methods are needed to address the AWE dataset efficiently |
Dobrišek, Simon; Čefarin, David; Štruc, Vitomir; Mihelič, France Preizkus Googlovega govornega programskega vmesnika pri samodejnem razpoznavanju govorjene slovenščine Inproceedings In: Jezikovne tehnologije in digitalna humanistika, pp. 47-51, 2016. @inproceedings{dobrivsekpreizkus,
title = {Preizkus Googlovega govornega programskega vmesnika pri samodejnem razpoznavanju govorjene slovenščine},
author = {Simon Dobrišek and David Čefarin and Vitomir Štruc and France Mihelič},
url = {http://www.sdjt.si/wp/wp-content/uploads/2016/09/JTDH-2016_Dobrisek-et-al_Preizkus-Googlovega-govornega-programskega-vmesnika.pdf},
year = {2016},
date = {2016-09-01},
booktitle = {Jezikovne tehnologije in digitalna humanistika},
pages = {47-51},
abstract = {Automatic speech recognizers are slowly maturing into technologies that enable humans to communicate more naturally and effectively with a variety of smart devices and information-communication systems. Large global companies such as Google, Microsoft, Apple, IBM and Baidu compete in developing the most reliable speech recognizers, supporting as many of the main world languages as possible. Due to the relatively small number of speakers, the support for the Slovenian spoken language is lagging behind, and among the major global companies only Google has recently supported our spoken language. The paper presents the results of our independent assessment of the Google speech-application programming interface for automatic Slovenian speech recognition. For the experiments, we used speech databases that are otherwise used for the development and assessment of Slovenian speech recognizers.
},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Automatic speech recognizers are slowly maturing into technologies that enable humans to communicate more naturally and effectively with a variety of smart devices and information-communication systems. Large global companies such as Google, Microsoft, Apple, IBM and Baidu compete in developing the most reliable speech recognizers, supporting as many of the main world languages as possible. Due to the relatively small number of speakers, the support for the Slovenian spoken language is lagging behind, and among the major global companies only Google has recently supported our spoken language. The paper presents the results of our independent assessment of the Google speech-application programming interface for automatic Slovenian speech recognition. For the experiments, we used speech databases that are otherwise used for the development and assessment of Slovenian speech recognizers.
|
Kravanja, Jaka; Žganec, Mario; Žganec-Gros, Jerneja; Dobrišek, Simon; Štruc, Vitomir Exploiting Spatio-Temporal Information for Light-Plane Labeling in Depth-Image Sensors Using Probabilistic Graphical Models Journal Article In: Informatica, vol. 27, no. 1, pp. 67–84, 2016. @article{kravanja2016exploiting,
title = {Exploiting Spatio-Temporal Information for Light-Plane Labeling in Depth-Image Sensors Using Probabilistic Graphical Models},
author = {Jaka Kravanja and Mario Žganec and Jerneja Žganec-Gros and Simon Dobrišek and Vitomir Štruc},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/11/jaka_informatica_camera.pdf},
year = {2016},
date = {2016-03-30},
journal = {Informatica},
volume = {27},
number = {1},
pages = {67--84},
publisher = {Vilnius University Institute of Mathematics and Informatics},
abstract = {This paper proposes a novel approach to light plane labeling in depth-image sensors relying on “uncoded” structured light. The proposed approach adopts probabilistic graphical models (PGMs) to solve the correspondence problem between the projected and the detected light patterns. The procedure for solving the correspondence problem is designed to take the spatial relations between the parts of the projected pattern and prior knowledge about the structure of the pattern into account, but it also exploits temporal information to achieve reliable light-plane labeling. The procedure is assessed on a database of light patterns detected with a specially developed imaging sensor that, unlike most existing solutions on the market, was shown to work reliably in outdoor environments as well as in the presence of other identical (active) sensors directed at the same scene. The results of our experiments show that the proposed approach is able to reliably solve the correspondence problem and assign light-plane labels to the detected pattern with a high accuracy, even when large spatial discontinuities are present in the observed scene.},
keywords = {3d imaging, correspondance, depth imaging, depth sensing, depth sensor, graphical models, sensor, structured light},
pubstate = {published},
tppubtype = {article}
}
This paper proposes a novel approach to light plane labeling in depth-image sensors relying on “uncoded” structured light. The proposed approach adopts probabilistic graphical models (PGMs) to solve the correspondence problem between the projected and the detected light patterns. The procedure for solving the correspondence problem is designed to take the spatial relations between the parts of the projected pattern and prior knowledge about the structure of the pattern into account, but it also exploits temporal information to achieve reliable light-plane labeling. The procedure is assessed on a database of light patterns detected with a specially developed imaging sensor that, unlike most existing solutions on the market, was shown to work reliably in outdoor environments as well as in the presence of other identical (active) sensors directed at the same scene. The results of our experiments show that the proposed approach is able to reliably solve the correspondence problem and assign light-plane labels to the detected pattern with a high accuracy, even when large spatial discontinuities are present in the observed scene. |
Grm, Klemen; Dobrišek, Simon; Štruc, Vitomir Deep pair-wise similarity learning for face recognition Inproceedings In: 4th International Workshop on Biometrics and Forensics (IWBF), pp. 1–6, IEEE 2016. @inproceedings{grm2016deep,
title = {Deep pair-wise similarity learning for face recognition},
author = {Klemen Grm and Simon Dobrišek and Vitomir Štruc},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/IWBF_2016.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {4th International Workshop on Biometrics and Forensics (IWBF)},
pages = {1--6},
organization = {IEEE},
abstract = {Recent advances in deep learning made it possible to build deep hierarchical models capable of delivering state-of-the-art performance in various vision tasks, such as object recognition, detection or tracking. For recognition tasks the most common approach when using deep models is to learn object representations (or features) directly from raw image-input and then feed the learned features to a suitable classifier. Deep models used in this pipeline are typically heavily parameterized and require enormous amounts of training data to deliver competitive recognition performance. Despite the use of data augmentation techniques, many application domains, predefined experimental protocols or specifics of the recognition problem limit the amount of available training data and make training an effective deep hierarchical model a difficult task. In this paper, we present a novel, deep pair-wise similarity learning (DPSL) strategy for deep models, developed specifically to overcome the problem of insufficient training data, and demonstrate its usage on the task of face recognition. Unlike existing (deep) learning strategies, DPSL operates on image-pairs and tries to learn pair-wise image similarities that can be used for recognition purposes directly instead of feature representations that need to be fed to appropriate classification techniques, as with traditional deep learning pipelines. Since our DPSL strategy assumes an image pair as the input to the learning procedure, the amount of training data available to train deep models is quadratic in the number of available training images, which is of paramount importance for models with a large number of parameters. We demonstrate the efficacy of the proposed learning strategy by developing a deep model for pose-invariant face recognition, called Pose-Invariant Similarity Index (PISI), and presenting comparative experimental results on the FERET an IJB-A datasets.},
keywords = {CNN, deep learning, face recognition, IJB-A, IWBF, performance evaluation, similarity learning},
pubstate = {published},
tppubtype = {inproceedings}
}
Recent advances in deep learning made it possible to build deep hierarchical models capable of delivering state-of-the-art performance in various vision tasks, such as object recognition, detection or tracking. For recognition tasks the most common approach when using deep models is to learn object representations (or features) directly from raw image-input and then feed the learned features to a suitable classifier. Deep models used in this pipeline are typically heavily parameterized and require enormous amounts of training data to deliver competitive recognition performance. Despite the use of data augmentation techniques, many application domains, predefined experimental protocols or specifics of the recognition problem limit the amount of available training data and make training an effective deep hierarchical model a difficult task. In this paper, we present a novel, deep pair-wise similarity learning (DPSL) strategy for deep models, developed specifically to overcome the problem of insufficient training data, and demonstrate its usage on the task of face recognition. Unlike existing (deep) learning strategies, DPSL operates on image-pairs and tries to learn pair-wise image similarities that can be used for recognition purposes directly instead of feature representations that need to be fed to appropriate classification techniques, as with traditional deep learning pipelines. Since our DPSL strategy assumes an image pair as the input to the learning procedure, the amount of training data available to train deep models is quadratic in the number of available training images, which is of paramount importance for models with a large number of parameters. We demonstrate the efficacy of the proposed learning strategy by developing a deep model for pose-invariant face recognition, called Pose-Invariant Similarity Index (PISI), and presenting comparative experimental results on the FERET an IJB-A datasets. |
Golob, Žiga; Gros, Jerneja Žganec; Štruc, Vitomir; Mihelič, France; Dobrišek, Simon A Composition Algorithm of Compact Finite-State Super Transducers for Grapheme-to-Phoneme Conversion Inproceedings In: International Conference on Text, Speech, and Dialogue, pp. 375–382, Springer 2016. @inproceedings{golob2016composition,
title = {A Composition Algorithm of Compact Finite-State Super Transducers for Grapheme-to-Phoneme Conversion},
author = {Žiga Golob and Jerneja Žganec Gros and Vitomir Štruc and France Mihelič and Simon Dobrišek},
year = {2016},
date = {2016-01-01},
booktitle = {International Conference on Text, Speech, and Dialogue},
pages = {375--382},
organization = {Springer},
abstract = {Minimal deterministic finite-state transducers (MDFSTs) are powerful models that can be used to represent pronunciation dictionaries in a compact form. Intuitively, we would assume that by increasing the size of the dictionary, the size of the MDFSTs would increase as well. However, as we show in the paper, this intuition does not hold for highly inflected languages. With such languages the size of the MDFSTs begins to decrease once the number of words in the represented dictionary reaches a certain threshold. Motivated by this observation, we have developed a new type of FST, called a finite-state super transducer (FSST), and show experimentally that the FSST is capable of representing pronunciation dictionaries with fewer states and transitions than MDFSTs. Furthermore, we show that (unlike MDFSTs) our FSSTs can also accept words that are not part of the represented dictionary. The phonetic transcriptions of these out-of-dictionary words may not always be correct, but the observed error rates are comparable to the error rates of the traditional methods for grapheme-to-phoneme conversion.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Minimal deterministic finite-state transducers (MDFSTs) are powerful models that can be used to represent pronunciation dictionaries in a compact form. Intuitively, we would assume that by increasing the size of the dictionary, the size of the MDFSTs would increase as well. However, as we show in the paper, this intuition does not hold for highly inflected languages. With such languages the size of the MDFSTs begins to decrease once the number of words in the represented dictionary reaches a certain threshold. Motivated by this observation, we have developed a new type of FST, called a finite-state super transducer (FSST), and show experimentally that the FSST is capable of representing pronunciation dictionaries with fewer states and transitions than MDFSTs. Furthermore, we show that (unlike MDFSTs) our FSSTs can also accept words that are not part of the represented dictionary. The phonetic transcriptions of these out-of-dictionary words may not always be correct, but the observed error rates are comparable to the error rates of the traditional methods for grapheme-to-phoneme conversion. |
2015
|
Grm, Klemen; Dobrišek, Simon; Štruc, Vitomir The pose-invariant similarity index for face recognition Inproceedings In: Proceedings of the Electrotechnical and Computer Science Conference (ERK), Portorož, Slovenia, 2015. @inproceedings{ERK2015Klemen,
title = {The pose-invariant similarity index for face recognition},
author = {Klemen Grm and Simon Dobrišek and Vitomir Štruc},
year = {2015},
date = {2015-04-20},
booktitle = {Proceedings of the Electrotechnical and Computer Science Conference (ERK)},
address = {Portorož, Slovenia},
keywords = {biometrics, CNN, deep learning, deep models, face verification, similarity learning},
pubstate = {published},
tppubtype = {inproceedings}
}
|
Štruc, Vitomir; Križaj, Janez; Dobrišek, Simon Modest face recognition Inproceedings In: Proceedings of the International Workshop on Biometrics and Forensics (IWBF), pp. 1–6, IEEE, 2015. @inproceedings{struc2015modest,
title = {Modest face recognition},
author = {Vitomir Štruc and Janez Križaj and Simon Dobrišek},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/IWBF2015.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the International Workshop on Biometrics and Forensics (IWBF)},
pages = {1--6},
publisher = {IEEE},
abstract = {The facial imagery usually at the disposal for forensics investigations is commonly of a poor quality due to the unconstrained settings in which it was acquired. The captured faces are typically non-frontal, partially occluded and of a low resolution, which makes the recognition task extremely difficult. In this paper we try to address this problem by presenting a novel framework for face recognition that combines diverse features sets (Gabor features, local binary patterns, local phase quantization features and pixel intensities), probabilistic linear discriminant analysis (PLDA) and data fusion based on linear logistic regression. With the proposed framework a matching score for the given pair of probe and target images is produced by applying PLDA on each of the four feature sets independently - producing a (partial) matching score for each of the PLDA-based feature vectors - and then combining the partial matching results at the score level to generate a single matching score for recognition. We make two main contributions in the paper: i) we introduce a novel framework for face recognition that relies on probabilistic MOdels of Diverse fEature SeTs (MODEST) to facilitate the recognition process and ii) benchmark it against the existing state-of-the-art. We demonstrate the feasibility of our MODEST framework on the FRGCv2 and PaSC databases and present comparative results with the state-of-the-art recognition techniques, which demonstrate the efficacy of our framework.},
keywords = {biometrics, face verification, Gabor features, image descriptors, LBP, multi modality, PaSC, performance evaluation},
pubstate = {published},
tppubtype = {inproceedings}
}
The facial imagery usually at the disposal for forensics investigations is commonly of a poor quality due to the unconstrained settings in which it was acquired. The captured faces are typically non-frontal, partially occluded and of a low resolution, which makes the recognition task extremely difficult. In this paper we try to address this problem by presenting a novel framework for face recognition that combines diverse features sets (Gabor features, local binary patterns, local phase quantization features and pixel intensities), probabilistic linear discriminant analysis (PLDA) and data fusion based on linear logistic regression. With the proposed framework a matching score for the given pair of probe and target images is produced by applying PLDA on each of the four feature sets independently - producing a (partial) matching score for each of the PLDA-based feature vectors - and then combining the partial matching results at the score level to generate a single matching score for recognition. We make two main contributions in the paper: i) we introduce a novel framework for face recognition that relies on probabilistic MOdels of Diverse fEature SeTs (MODEST) to facilitate the recognition process and ii) benchmark it against the existing state-of-the-art. We demonstrate the feasibility of our MODEST framework on the FRGCv2 and PaSC databases and present comparative results with the state-of-the-art recognition techniques, which demonstrate the efficacy of our framework. |
Beveridge, Ross; Zhang, Hao; Draper, Bruce A; Flynn, Patrick J; Feng, Zhenhua; Huber, Patrik; Kittler, Josef; Huang, Zhiwu; Li, Shaoxin; Li, Yan; Štruc, Vitomir; Križaj, Janez; others, Report on the FG 2015 video person recognition evaluation Inproceedings In: 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (IEEE FG), pp. 1–8, IEEE 2015. @inproceedings{beveridge2015report,
title = {Report on the FG 2015 video person recognition evaluation},
author = {Ross Beveridge and Hao Zhang and Bruce A Draper and Patrick J Flynn and Zhenhua Feng and Patrik Huber and Josef Kittler and Zhiwu Huang and Shaoxin Li and Yan Li and Vitomir Štruc and Janez Križaj and others},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/fg2015videoEvalPreprint.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (IEEE FG)},
volume = {1},
pages = {1--8},
organization = {IEEE},
abstract = {This report presents results from the Video Person Recognition Evaluation held in conjunction with the 11th IEEE International Conference on Automatic Face and Gesture Recognition. Two experiments required algorithms to recognize people in videos from the Point-and-Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. Five groups from around the world participated in the evaluation. The video handheld experiment was included in the International Joint Conference on Biometrics (IJCB) 2014 Handheld Video Face and Person Recognition Competition. The top verification rate from this evaluation is double that of the top performer in the IJCB competition. Analysis shows that the factor most effecting algorithm performance is the combination of location and action: where the video was acquired and what the person was doing.},
keywords = {biometrics, competition, face verification, FG, group evaluation, PaSC, performance evaluation},
pubstate = {published},
tppubtype = {inproceedings}
}
This report presents results from the Video Person Recognition Evaluation held in conjunction with the 11th IEEE International Conference on Automatic Face and Gesture Recognition. Two experiments required algorithms to recognize people in videos from the Point-and-Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. Five groups from around the world participated in the evaluation. The video handheld experiment was included in the International Joint Conference on Biometrics (IJCB) 2014 Handheld Video Face and Person Recognition Competition. The top verification rate from this evaluation is double that of the top performer in the IJCB competition. Analysis shows that the factor most effecting algorithm performance is the combination of location and action: where the video was acquired and what the person was doing. |
Justin, Tadej; Štruc, Vitomir; Dobrišek, Simon; Vesnicer, Boštjan; Ipšić, Ivo; Mihelič, France Speaker de-identification using diphone recognition and speech synthesis Inproceedings In: 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (IEEE FG): DeID 2015, pp. 1–7, IEEE 2015. @inproceedings{justin2015speaker,
title = {Speaker de-identification using diphone recognition and speech synthesis},
author = {Tadej Justin and Vitomir Štruc and Simon Dobrišek and Boštjan Vesnicer and Ivo Ipšić and France Mihelič},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/Deid2015.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (IEEE FG): DeID 2015},
volume = {4},
pages = {1--7},
organization = {IEEE},
abstract = {The paper addresses the problem of speaker (or voice) de-identification by presenting a novel approach for concealing the identity of speakers in their speech. The proposed technique first recognizes the input speech with a diphone recognition system and then transforms the obtained phonetic transcription into the speech of another speaker with a speech synthesis system. Due to the fact that a Diphone RecOgnition step and a sPeech SYnthesis step are used during the deidentification, we refer to the developed technique as DROPSY. With this approach the acoustical models of the recognition and synthesis modules are completely independent from each other, which ensures the highest level of input speaker deidentification. The proposed DROPSY-based de-identification approach is language dependent, text independent and capable of running in real-time due to the relatively simple computing methods used. When designing speaker de-identification technology two requirements are typically imposed on the deidentification techniques: i) it should not be possible to establish the identity of the speakers based on the de-identified speech, and ii) the processed speech should still sound natural and be intelligible. This paper, therefore, implements the proposed DROPSY-based approach with two different speech synthesis techniques (i.e, with the HMM-based and the diphone TDPSOLA- based technique). The obtained de-identified speech is evaluated for intelligibility and evaluated in speaker verification experiments with a state-of-the-art (i-vector/PLDA) speaker recognition system. The comparison of both speech synthesis modules integrated in the proposed method reveals that both can efficiently de-identify the input speakers while still producing intelligible speech.},
keywords = {DEID, FG, speech deidentification, speech recognition, speech synthesis, speech technologies},
pubstate = {published},
tppubtype = {inproceedings}
}
The paper addresses the problem of speaker (or voice) de-identification by presenting a novel approach for concealing the identity of speakers in their speech. The proposed technique first recognizes the input speech with a diphone recognition system and then transforms the obtained phonetic transcription into the speech of another speaker with a speech synthesis system. Due to the fact that a Diphone RecOgnition step and a sPeech SYnthesis step are used during the deidentification, we refer to the developed technique as DROPSY. With this approach the acoustical models of the recognition and synthesis modules are completely independent from each other, which ensures the highest level of input speaker deidentification. The proposed DROPSY-based de-identification approach is language dependent, text independent and capable of running in real-time due to the relatively simple computing methods used. When designing speaker de-identification technology two requirements are typically imposed on the deidentification techniques: i) it should not be possible to establish the identity of the speakers based on the de-identified speech, and ii) the processed speech should still sound natural and be intelligible. This paper, therefore, implements the proposed DROPSY-based approach with two different speech synthesis techniques (i.e, with the HMM-based and the diphone TDPSOLA- based technique). The obtained de-identified speech is evaluated for intelligibility and evaluated in speaker verification experiments with a state-of-the-art (i-vector/PLDA) speaker recognition system. The comparison of both speech synthesis modules integrated in the proposed method reveals that both can efficiently de-identify the input speakers while still producing intelligible speech. |
Dobrišek, Simon; Štruc, Vitomir; Križaj, Janez; Mihelič, France Face recognition in the wild with the Probabilistic Gabor-Fisher Classifier Inproceedings In: 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (IEEE FG): BWild 2015, pp. 1–6, IEEE 2015. @inproceedings{dobrivsek2015face,
title = {Face recognition in the wild with the Probabilistic Gabor-Fisher Classifier},
author = {Simon Dobrišek and Vitomir Štruc and Janez Križaj and France Mihelič},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/Bwild2015.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (IEEE FG): BWild 2015},
volume = {2},
pages = {1--6},
organization = {IEEE},
abstract = {The paper addresses the problem of face recognition in the wild. It introduces a novel approach to unconstrained face recognition that exploits Gabor magnitude features and a simplified version of the probabilistic linear discriminant analysis (PLDA). The novel approach, named Probabilistic Gabor-Fisher Classifier (PGFC), first extracts a vector of Gabor magnitude features from the given input image using a battery of Gabor filters, then reduces the dimensionality of the extracted feature vector by projecting it into a low-dimensional subspace and finally produces a representation suitable for identity inference by applying PLDA to the projected feature vector. The proposed approach extends the popular Gabor-Fisher Classifier (GFC) to a probabilistic setting and thus improves on the generalization capabilities of the GFC method. The PGFC technique is assessed in face verification experiments on the Point and Shoot Face Recognition Challenge (PaSC) database, which features real-world videos of subjects performing everyday tasks. Experimental results on this challenging database show the feasibility of the proposed approach, which improves on the best results on this database reported in the literature by the time of writing.},
keywords = {biometrics, BWild, FG, Gabor features, PaSC, plda, probabilistic Gabor Fisher classifier, probabilistic linear discriminant analysis},
pubstate = {published},
tppubtype = {inproceedings}
}
The paper addresses the problem of face recognition in the wild. It introduces a novel approach to unconstrained face recognition that exploits Gabor magnitude features and a simplified version of the probabilistic linear discriminant analysis (PLDA). The novel approach, named Probabilistic Gabor-Fisher Classifier (PGFC), first extracts a vector of Gabor magnitude features from the given input image using a battery of Gabor filters, then reduces the dimensionality of the extracted feature vector by projecting it into a low-dimensional subspace and finally produces a representation suitable for identity inference by applying PLDA to the projected feature vector. The proposed approach extends the popular Gabor-Fisher Classifier (GFC) to a probabilistic setting and thus improves on the generalization capabilities of the GFC method. The PGFC technique is assessed in face verification experiments on the Point and Shoot Face Recognition Challenge (PaSC) database, which features real-world videos of subjects performing everyday tasks. Experimental results on this challenging database show the feasibility of the proposed approach, which improves on the best results on this database reported in the literature by the time of writing. |
Justin, Tadej; Štruc, Vitomir; Žibert, Janez; Mihelič, France Development and Evaluation of the Emotional Slovenian Speech Database-EmoLUKS Inproceedings In: Proceedings of the International Conference on Text, Speech, and Dialogue (TSD), pp. 351–359, Springer 2015. @inproceedings{justin2015development,
title = {Development and Evaluation of the Emotional Slovenian Speech Database-EmoLUKS},
author = {Tadej Justin and Vitomir Štruc and Janez Žibert and France Mihelič},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/tsd2015.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the International Conference on Text, Speech, and Dialogue (TSD)},
pages = {351--359},
organization = {Springer},
abstract = {This paper describes a speech database built from 17 Slovenian radio dramas. The dramas were obtained from the national radio-and-television station (RTV Slovenia) and were given at the universities disposal with an academic license for processing and annotating the audio material. The utterances of one male and one female speaker were transcribed, segmented and then annotated with emotional states of the speakers. The annotation of the emotional states was conducted in two stages with our own web-based application for crowd sourcing. The final (emotional) speech database consists of 1385 recordings of one male (975 recordings) and one female (410 recordings) speaker and contains labeled emotional speech with a total duration of around 1 hour and 15 minutes. The paper presents the two-stage annotation process used to label the data and demonstrates the usefulness of the employed annotation methodology. Baseline emotion recognition experiments are also presented. The reported results are presented with the un-weighted as well as weighted average recalls and precisions for 2-class and 7-class recognition experiments.},
keywords = {annotated data, dataset, dataset of emotional speech, EmoLUKS, emotional speech synthesis, speech synthesis, speech technologies, transcriptions},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper describes a speech database built from 17 Slovenian radio dramas. The dramas were obtained from the national radio-and-television station (RTV Slovenia) and were given at the universities disposal with an academic license for processing and annotating the audio material. The utterances of one male and one female speaker were transcribed, segmented and then annotated with emotional states of the speakers. The annotation of the emotional states was conducted in two stages with our own web-based application for crowd sourcing. The final (emotional) speech database consists of 1385 recordings of one male (975 recordings) and one female (410 recordings) speaker and contains labeled emotional speech with a total duration of around 1 hour and 15 minutes. The paper presents the two-stage annotation process used to label the data and demonstrates the usefulness of the employed annotation methodology. Baseline emotion recognition experiments are also presented. The reported results are presented with the un-weighted as well as weighted average recalls and precisions for 2-class and 7-class recognition experiments. |
Camgoz, Necati Cihan; Štruc, Vitomir; Gokberk, Berk; Akarun, Lale; Kindiroglu, Ahmet Alp Facial Landmark Localization in Depth Images using Supervised Ridge Descent Inproceedings In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW): Chaa Learn, pp. 136–141, 2015. @inproceedings{cihan2015facial,
title = {Facial Landmark Localization in Depth Images using Supervised Ridge Descent},
author = {Necati Cihan Camgoz and Vitomir Štruc and Berk Gokberk and Lale Akarun and Ahmet Alp Kindiroglu},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/Camgoz_Facial_Landmark_Localization_ICCV_2015_paper.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW): Chaa Learn},
pages = {136--141},
abstract = {Supervised Descent Method (SDM) has proven successful in many computer vision applications such as face alignment, tracking and camera calibration. Recent studies which used SDM, achieved state of the-art performance on facial landmark localization in depth images [4]. In this study, we propose to use ridge regression instead of least squares regression for learning the SDM, and to change feature sizes in each iteration, effectively turning the landmark search into a coarse to fine process. We apply the proposed method to facial landmark localization on the Bosphorus 3D Face Database; using frontal depth images with no occlusion. Experimental results confirm that both ridge regression and using adaptive feature sizes improve the localization accuracy considerably},
keywords = {3d landmarking, facial landmarking, landmark localization, landmarking, ridge regression, SDM},
pubstate = {published},
tppubtype = {inproceedings}
}
Supervised Descent Method (SDM) has proven successful in many computer vision applications such as face alignment, tracking and camera calibration. Recent studies which used SDM, achieved state of the-art performance on facial landmark localization in depth images [4]. In this study, we propose to use ridge regression instead of least squares regression for learning the SDM, and to change feature sizes in each iteration, effectively turning the landmark search into a coarse to fine process. We apply the proposed method to facial landmark localization on the Bosphorus 3D Face Database; using frontal depth images with no occlusion. Experimental results confirm that both ridge regression and using adaptive feature sizes improve the localization accuracy considerably |
Murovec, Boštjan Job-shop local-search move evaluation without direct consideration of the criterion’s value Journal Article In: European Journal of Operational Research, vol. 241, no. 2, pp. 320 - 329, 2015, ISSN: 0377-2217. @article{MUROVEC2015320,
title = {Job-shop local-search move evaluation without direct consideration of the criterion’s value},
author = {Boštjan Murovec},
url = {http://www.sciencedirect.com/science/article/pii/S0377221714007309},
doi = {https://doi.org/10.1016/j.ejor.2014.08.044},
issn = {0377-2217},
year = {2015},
date = {2015-01-01},
journal = {European Journal of Operational Research},
volume = {241},
number = {2},
pages = {320 - 329},
abstract = {This article focuses on the evaluation of moves for the local search of the job-shop problem with the makespan criterion. We reason that the omnipresent ranking of moves according to their resulting value of a criterion function makes the local search unnecessarily myopic. Consequently, we introduce an alternative evaluation that relies on a surrogate quantity of the move’s potential, which is related to, but not strongly coupled with, the bare criterion. The approach is confirmed by empirical tests, where the proposed evaluator delivers a new upper bound on the well-known benchmark test yn2. The line of the argumentation also shows that by sacrificing accuracy the established makespan estimators unintentionally improve on the move evaluation in comparison to the exact makespan calculation, in contrast to the belief that the reliance on estimation degrades the optimization results.},
keywords = {Job-shop, Local search, Makespan, Move evaluation, Scheduling},
pubstate = {published},
tppubtype = {article}
}
This article focuses on the evaluation of moves for the local search of the job-shop problem with the makespan criterion. We reason that the omnipresent ranking of moves according to their resulting value of a criterion function makes the local search unnecessarily myopic. Consequently, we introduce an alternative evaluation that relies on a surrogate quantity of the move’s potential, which is related to, but not strongly coupled with, the bare criterion. The approach is confirmed by empirical tests, where the proposed evaluator delivers a new upper bound on the well-known benchmark test yn2. The line of the argumentation also shows that by sacrificing accuracy the established makespan estimators unintentionally improve on the move evaluation in comparison to the exact makespan calculation, in contrast to the belief that the reliance on estimation degrades the optimization results. |
Murovec, Boštjan; Kolbl, Sabina; Stres, Blaž Methane Yield Database: Online infrastructure and bioresource for methane yield data and related metadata Journal Article In: Bioresource Technology, vol. 189, pp. 217 - 223, 2015, ISSN: 0960-8524. @article{MUROVEC2015217,
title = {Methane Yield Database: Online infrastructure and bioresource for methane yield data and related metadata},
author = {Boštjan Murovec and Sabina Kolbl and Blaž Stres},
url = {http://www.sciencedirect.com/science/article/pii/S0960852415005040},
doi = {https://doi.org/10.1016/j.biortech.2015.04.021},
issn = {0960-8524},
year = {2015},
date = {2015-01-01},
journal = {Bioresource Technology},
volume = {189},
pages = {217 - 223},
abstract = {The aim of this study was to develop and validate a community supported online infrastructure and bioresource for methane yield data and accompanying metadata collected from published literature. In total, 1164 entries described by 15,749 data points were assembled. Analysis of data collection showed little congruence in reporting of methodological approaches. The largest identifiable source of variation in reported methane yields was represented by authorship (i.e. substrate batches within particular substrate class) within which experimental scales (volumes (0.02–5l), incubation temperature (34–40°C) and % VS of substrate played an important role (p<0.0},
keywords = {Batch, Biogas, Industry, Infrastructure, Methane yield database},
pubstate = {published},
tppubtype = {article}
}
The aim of this study was to develop and validate a community supported online infrastructure and bioresource for methane yield data and accompanying metadata collected from published literature. In total, 1164 entries described by 15,749 data points were assembled. Analysis of data collection showed little congruence in reporting of methodological approaches. The largest identifiable source of variation in reported methane yields was represented by authorship (i.e. substrate batches within particular substrate class) within which experimental scales (volumes (0.02–5l), incubation temperature (34–40°C) and % VS of substrate played an important role (p<0.0 |
Henderson, Gemma; Cox, Faith; Ganesh, Siva; Jonker, Arjan; Young, Wayne; Janssen, Peter H Rumen microbial community composition varies with diet and host, but a core microbiome is found across a wide geographical range Journal Article In: Scientific reports, vol. art 14567, no. 5, pp. 1–13, 2015, ISSN: 2045-2322. @article{Henderson_Cox_Ganesh_Jonker_Young_Janssen_2015,
title = {Rumen microbial community composition varies with diet and host, but a core microbiome is found across a wide geographical range},
author = {Gemma Henderson and Faith Cox and Siva Ganesh and Arjan Jonker and Wayne Young and Peter H Janssen},
url = {http://www.nature.com/articles/srep14567},
doi = {10.1038/srep14567},
issn = {2045-2322},
year = {2015},
date = {2015-01-01},
journal = {Scientific reports},
volume = {art 14567},
number = {5},
pages = {1–13},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
2014
|
Peer, Peter; Emeršič, Žiga; Bule, Jernej; Žganec-Gros, Jerneja; Štruc, Vitomir Strategies for exploiting independent cloud implementations of biometric experts in multibiometric scenarios Journal Article In: Mathematical problems in engineering, vol. 2014, 2014. @article{peer2014strategies,
title = {Strategies for exploiting independent cloud implementations of biometric experts in multibiometric scenarios},
author = {Peter Peer and Žiga Emeršič and Jernej Bule and Jerneja Žganec-Gros and Vitomir Štruc},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/585139-1.pdf},
doi = {http://dx.doi.org/10.1155/2014/585139},
year = {2014},
date = {2014-01-01},
journal = {Mathematical problems in engineering},
volume = {2014},
publisher = {Hindawi Publishing Corporation},
abstract = {Cloud computing represents one of the fastest growing areas of technology and offers a new computing model for various applications and services. This model is particularly interesting for the area of biometric recognition, where scalability, processing power, and storage requirements are becoming a bigger and bigger issue with each new generation of recognition technology. Next to the availability of computing resources, another important aspect of cloud computing with respect to biometrics is accessibility. Since biometric cloud services are easily accessible, it is possible to combine different existing implementations and design new multibiometric services that next to almost unlimited resources also offer superior recognition performance and, consequently, ensure improved security to its client applications. Unfortunately, the literature on the best strategies of how to combine existing implementations of cloud-based biometric experts into a multibiometric service is virtually nonexistent. In this paper, we try to close this gap and evaluate different strategies for combining existing biometric experts into a multibiometric cloud service. We analyze the (fusion) strategies from different perspectives such as performance gains, training complexity, or resource consumption and present results and findings important to software developers and other researchers working in the areas of biometrics and cloud computing. The analysis is conducted based on two biometric cloud services, which are also presented in the paper.},
keywords = {application, biometrics, cloud computing, face recognition, fingerprint recognition, fusion},
pubstate = {published},
tppubtype = {article}
}
Cloud computing represents one of the fastest growing areas of technology and offers a new computing model for various applications and services. This model is particularly interesting for the area of biometric recognition, where scalability, processing power, and storage requirements are becoming a bigger and bigger issue with each new generation of recognition technology. Next to the availability of computing resources, another important aspect of cloud computing with respect to biometrics is accessibility. Since biometric cloud services are easily accessible, it is possible to combine different existing implementations and design new multibiometric services that next to almost unlimited resources also offer superior recognition performance and, consequently, ensure improved security to its client applications. Unfortunately, the literature on the best strategies of how to combine existing implementations of cloud-based biometric experts into a multibiometric service is virtually nonexistent. In this paper, we try to close this gap and evaluate different strategies for combining existing biometric experts into a multibiometric cloud service. We analyze the (fusion) strategies from different perspectives such as performance gains, training complexity, or resource consumption and present results and findings important to software developers and other researchers working in the areas of biometrics and cloud computing. The analysis is conducted based on two biometric cloud services, which are also presented in the paper. |