Decision fusion based multimodal hierarchical method for speech emotion recognition from audio and text

Alqurashi, Nawal, Li, Yuhua

, Sidorov, Kirill

and Marshall, Andrew

2024. Decision fusion based multimodal hierarchical method for speech emotion recognition from audio and text. Presented at: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 9-11 October 2024.

Preview

PDF - Accepted Post-Print Version
Download (226kB) | Preview

Abstract

Expressing emotions is essential in human interaction. Often, individuals convey emotions through neutral speech, while the underlying meaning carries emotional weight. Conversely, tone can also convey emotion despite neutral words. Most Speech Emotion Recognition research overlooks this. We address this gap with a multimodal emotion recognition system using hierarchical classifiers and a novel decision fusion method. Our approach analyses emotional cues from speech and text, measuring their impact on predicted classes, considering emotional or neutral contributions for each instance. Results on the IEMOCAP dataset show our method's effectiveness: 69.45% and 65.62% weighted accuracy in speaker-dependent and speaker -independent settings, respectively.

Item Type:	Conference or Workshop Item (Paper)
Status:	In Press
Schools:	Schools > Computer Science & Informatics
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Date of First Compliant Deposit:	30 July 2024
Date of Acceptance:	13 June 2024
Last Modified:	08 Nov 2024 08:15
URI:	https://orca.cardiff.ac.uk/id/eprint/170045

Actions (repository staff only)

Edit Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)