Publications – Laboratory for Machine Intelligence

2014

Vesnicer, Boštjan; Žganec-Gros, Jerneja; Dobrišek, Simon; Štruc, Vitomir

Incorporating Duration Information into I-Vector-Based Speaker-Recognition Systems Proceedings Article

In: Proceedings of Odyssey: The Speaker and Language Recognition Workshop, pp. 241–248, 2014.

Abstract | Links | BibTeX | Tags: acustic features, biometrics, duration, duration modeling, i-vector, i-vector challenge, Odyssey, performance evaluation, speaker recognition, speech technologies

@inproceedings{vesnicer2014incorporating,

title = {Incorporating Duration Information into I-Vector-Based Speaker-Recognition Systems},

author = {Boštjan Vesnicer and Jerneja Žganec-Gros and Simon Dobrišek and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/en/incorporatingdurationinformationintoi-vector-basedspeaker-recognitionsystems/},

year  = {2014},

date = {2014-01-01},

urldate = {2014-01-01},

booktitle = {Proceedings of Odyssey: The Speaker and Language Recognition Workshop},

pages = {241--248},

abstract = {Most of the existing literature on i-vector-based speaker recognition focuses on recognition problems, where i-vectors are extracted from speech recordings of sufficient length. The majority of modeling/recognition techniques therefore simply ignores the fact that the i-vectors are most likely estimated unreliably when short recordings are used for their computation. Only recently, were a number of solutions proposed in the literature to address the problem of duration variability, all treating the i-vector as a random variable whose posterior distribution can be parameterized by the posterior mean and the posterior covariance. In this setting the covariance matrix serves as a measure of uncertainty that is related to the length of the available recording. In contract to these solutions, we address the problem of duration variability through weighted statistics. We demonstrate in the paper how established feature transformation techniques regularly used in the area of speaker recognition, such as PCA or WCCN, can be modified to take duration into account. We evaluate our weighting scheme in the scope of the i-vector challenge organized as part of the Odyssey, Speaker and Language Recognition Workshop 2014 and achieve a minimal DCF of 0.280, which at the time of writing puts our approach in third place among all the participating institutions.},

keywords = {acustic features, biometrics, duration, duration modeling, i-vector, i-vector challenge, Odyssey, performance evaluation, speaker recognition, speech technologies},

pubstate = {published},

tppubtype = {inproceedings}

}

2010

Gajšek, Rok; Štruc, Vitomir; Mihelič, France

Multi-modal Emotion Recognition using Canonical Correlations and Acustic Features Proceedings Article

In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 4133-4136, IAPR Istanbul, Turkey, 2010.

Abstract | Links | BibTeX | Tags: acustic features, canonical correlations, emotion recognition, facial expression recognition, multi modality, speech processing, speech technologies