Publications – Laboratory for Machine Intelligence

Mishra, Gargi; Bajpai, Supriya; Saini, Dharmender; Jain, Rachna; Jain, Deepak Kumar; Štruc, Vitomir

EmoVisioNet: A hybrid network unifying lightweight CNN and attention-based vision model for facial emotion detection Journal Article

In: Neurocomputing, vol. 665, no. 132224, pp. 1-12, 2026.

Abstract | Links | BibTeX | Tags: CNN, deep learning, facial expression recognition, lightweight models

@article{EmoVison2025,

title = {EmoVisioNet: A hybrid network unifying lightweight CNN and attention-based vision model for facial emotion detection},

author = {Gargi Mishra and Supriya Bajpai and Dharmender Saini and Rachna Jain and Deepak Kumar Jain and Vitomir Štruc },

doi = {https://doi.org/10.1016/j.neucom.2025.132224},

year  = {2026},

date = {2026-02-07},

urldate = {2025-11-28},

journal = {Neurocomputing},

volume = {665},

number = {132224},

pages = {1-12},

abstract = {Facial emotion detection has witnessed a surge in demand across numerous applications, including human-computer interaction, healthcare, and security. Accurate expression recognition is crucial for improving human-computer interactions and understanding human behavior. Existing facial emotion detection models face challenges in achieving both high accuracy and real-time processing due to complex architectures. Our goal is to create an efficient yet accurate solution that can work on resource-constrained devices. To address the challenge of accurately recognizing emotions from facial expressions, we propose a novel hybrid approach that combines the strengths of pretrained Lightweight Convolutional Neural Networks (CNN), and Attention-based Vision Models. The pretrained Lightweight CNN serves as a feature extractor, efficiently capturing facial features, while the attention model refines the feature representation to focus on crucial regions of the face associated with different expressions. This enables our model to achieve state-of-the-art (SOTA) accuracy with reduced computational requirements. The proposed model, EmoVisioNet, achieves superior performance across multiple datasets, attaining 99.97 % accuracy on CK+, 96.23 % on RAF-DB, 93.88 % on FER2013, and 96.91 % on FERPlus. The obtained results surpass the current state-of-the-art in this field, demonstrating the EmoVisioNet’s superior performance in facial expression recognition.},

keywords = {CNN, deep learning, facial expression recognition, lightweight models},

pubstate = {published},

tppubtype = {article}

}

Close

Dobrišek, Simon; Gajšek, Rok; Mihelič, France; Pavešić, Nikola; Štruc, Vitomir

Towards efficient multi-modal emotion recognition Journal Article

In: International Journal of Advanced Robotic Systems, vol. 10, no. 53, 2013.

Abstract | Links | BibTeX | Tags: avid database, emotion recognition, facial expression recognition, multi modality, speech technologies

@article{dobrivsek2013towards,

title = {Towards efficient multi-modal emotion recognition},

author = {Simon Dobrišek and Rok Gajšek and France Mihelič and Nikola Pavešić and Vitomir Štruc},

url = {https://lmi.fe.uni-lj.si/en/towardsefficientmulti-modalemotionrecognition/},

doi = {10.5772/54002},

year  = {2013},

date = {2013-01-01},

urldate = {2013-01-01},

journal = {International Journal of Advanced Robotic Systems},

volume = {10},

number = {53},

abstract = {The paper presents a multi-modal emotion recognition system exploiting audio and video (i.e., facial expression) information. The system first processes both sources of information individually to produce corresponding matching scores and then combines the computed matching scores to obtain a classification decision. For the video part of the system, a novel approach to emotion recognition, relying on image-set matching, is developed. The proposed approach avoids the need for detecting and tracking specific facial landmarks throughout the given video sequence, which represents a common source of error in video-based emotion recognition systems, and, therefore, adds robustness to the video processing chain. The audio part of the system, on the other hand, relies on utterance-specific Gaussian Mixture Models (GMMs) adapted from a Universal Background Model (UBM) via the maximum a posteriori probability (MAP) estimation. It improves upon the standard UBM-MAP procedure by exploiting gender information when building the utterance-specific GMMs, thus ensuring enhanced emotion recognition performance. Both the uni-modal parts as well as the combined system are assessed on the challenging multi-modal eNTERFACE'05 corpus with highly encouraging results. The developed system represents a feasible solution to emotion recognition that can easily be integrated into various systems, such as humanoid robots, smart surveillance systems and alike.},

keywords = {avid database, emotion recognition, facial expression recognition, multi modality, speech technologies},

pubstate = {published},

tppubtype = {article}

}

Close

Gajšek, Rok; Štruc, Vitomir; Mihelič, France

Multi-modal Emotion Recognition using Canonical Correlations and Acustic Features Proceedings Article

In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 4133-4136, IAPR Istanbul, Turkey, 2010.

Abstract | Links | BibTeX | Tags: acustic features, canonical correlations, emotion recognition, facial expression recognition, multi modality, speech processing, speech technologies

Gajšek, Rok; Štruc, Vitomir; Mihelič, France

Multi-modal Emotion Recognition based on the Decoupling of Emotion and Speaker Information Proceedings Article

In: Proceedings of Text, Speech and Dialogue (TSD), pp. 275-282, Springer-Verlag, Berlin, Heidelberg, 2010.

Abstract | Links | BibTeX | Tags: emotion recognition, facial expression recognition, multi modality, speech processing, speech technologies, spontaneous emotions, video processing

Gajšek, Rok; Štruc, Vitomir; Dobrišek, Simon; Mihelič, France

Emotion recognition using linear transformations in combination with video Proceedings Article

In: Speech and intelligence: proceedings of Interspeech 2009, pp. 1967-1970, Brighton, UK, 2009.

Abstract | Links | BibTeX | Tags: emotion recognition, facial expression recognition, interspeech, speech, speech technologies, spontaneous emotions

Gajšek, Rok; Štruc, Vitomir; Mihelič, France; Podlesek, Anja; Komidar, Luka; Sočan, Gregor; Bajec, Boštjan

Multi-modal emotional database: AvID Journal Article

In: Informatica (Ljubljana), vol. 33, no. 1, pp. 101-106, 2009.

Abstract | Links | BibTeX | Tags: avid, database, dataset, emotion recognition, facial expression recognition, speech, speech technologies, spontaneous emotions

Gajšek, Rok; Štruc, Vitomir; Dobrišek, Simon; Žibert, Janez; Mihelič, France; Pavešić, Nikola

Combining audio and video for detection of spontaneous emotions Proceedings Article

In: Biometric ID management and multimodal communication, pp. 114-121, Springer-Verlag, Berlin, Heidelberg, 2009.

Abstract | Links | BibTeX | Tags: emotion recognition, facial expression recognition, performance evaluation, speech processing, speech technologies

Gajšek, Rok; Podlesek, Anja; Komidar, Luka; Sočan, Grekor; Bajec, Boštjan; Štruc, Vitomir; Bucik, Valentin; Mihelič, France

AvID: audio-video emotional database Proceedings Article

In: Proceedings of the 11th International Multi-conference Information Society (IS'08), pp. 70-74, Ljubljana, Slovenia, 2008.

BibTeX | Tags: database, dataset, emotion recognition, facial expression recognition, multimodal database, speech technology, spontaneous emotions