2010 |
Gajšek, Rok; Štruc, Vitomir; Mihelič, France Multi-modal Emotion Recognition based on the Decoupling of Emotion and Speaker Information Proceedings Article In: Proceedings of Text, Speech and Dialogue (TSD), pp. 275-282, Springer-Verlag, Berlin, Heidelberg, 2010. Abstract | Links | BibTeX | Tags: emotion recognition, facial expression recognition, multi modality, speech processing, speech technologies, spontaneous emotions, video processing @inproceedings{TSD_Emo_Gajsek, The standard features used in emotion recognition carry, besides the emotion related information, also cues about the speaker. This is expected, since the nature of emotionally colored speech is similar to the variations in the speech signal, caused by different speakers. Therefore, we present a gradient descent derived transformation for the decoupling of emotion and speaker information contained in the acoustic features. The Interspeech ’09 Emotion Challenge feature set is used as the baseline for the audio part. A similar procedure is employed on the video signal, where the nuisance attribute projection (NAP) is used to derive the transformation matrix, which contains information about the emotional state of the speaker. Ultimately, different NAP transformation matrices are compared using canonical correlations. The audio and video sub-systems are combined at the matching score level using different fusion techniques. The presented system is assessed on the publicly available eNTERFACE’05 database where significant improvements in the recognition performance are observed when compared to the stat-of-the-art baseline. |
2009 |
Gajšek, Rok; Štruc, Vitomir; Dobrišek, Simon; Mihelič, France Emotion recognition using linear transformations in combination with video Proceedings Article In: Speech and intelligence: proceedings of Interspeech 2009, pp. 1967-1970, Brighton, UK, 2009. Abstract | Links | BibTeX | Tags: emotion recognition, facial expression recognition, interspeech, speech, speech technologies, spontaneous emotions @inproceedings{InterSp2009, The paper discuses the usage of linear transformations of Hidden Markov Models, normally employed for speaker and environment adaptation, as a way of extracting the emotional components from the speech. A constrained version of Maximum Likelihood Linear Regression (CMLLR) transformation is used as a feature for classification of normal or aroused emotional state. We present a procedure of incrementally building a set of speaker independent acoustic models, that are used to estimate the CMLLR transformations for emotion classification. An audio-video database of spontaneous emotions (AvID) is briefly presented since it forms the basis for the evaluation of the proposed method. Emotion classification using the video part of the database is also described and the added value of combining the visual information with the audio features is shown. |
Gajšek, Rok; Štruc, Vitomir; Mihelič, France; Podlesek, Anja; Komidar, Luka; Sočan, Gregor; Bajec, Boštjan Multi-modal emotional database: AvID Journal Article In: Informatica (Ljubljana), vol. 33, no. 1, pp. 101-106, 2009. Abstract | Links | BibTeX | Tags: avid, database, dataset, emotion recognition, facial expression recognition, speech, speech technologies, spontaneous emotions @article{Inform-Gajsek_2009, This paper presents our work on recording a multi-modal database containing emotional audio and video recordings. In designing the recording strategies a special attention was payed to gather data involving spontaneous emotions and therefore obtain a more realistic training and testing conditions for experiments. With specially planned scenarios including playing computer games and conducting an adaptive intelligence test different levels of arousal were induced. This will enable us to both detect different emotional states as well as experiment in speaker identification/verification of people involved in communications. So far the multi-modal database has been recorded and basic evaluation of the data was processed. |
2008 |
Gajšek, Rok; Podlesek, Anja; Komidar, Luka; Sočan, Grekor; Bajec, Boštjan; Štruc, Vitomir; Bucik, Valentin; Mihelič, France AvID: audio-video emotional database Proceedings Article In: Proceedings of the 11th International Multi-conference Information Society (IS'08), pp. 70-74, Ljubljana, Slovenia, 2008. BibTeX | Tags: database, dataset, emotion recognition, facial expression recognition, multimodal database, speech technology, spontaneous emotions @inproceedings{JJ2008, |