Alqurashi, Nawal, Li, Yuhua ORCID: https://orcid.org/0000-0003-2913-4478, Sidorov, Kirill ORCID: https://orcid.org/0000-0001-7935-4132 and Marshall, Andrew ORCID: https://orcid.org/0000-0003-2789-1395 2024. Decision fusion based multimodal hierarchical method for speech emotion recognition from audio and text. Presented at: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 9-11 October 2024. |
Preview |
PDF
- Accepted Post-Print Version
Download (226kB) | Preview |
Abstract
Expressing emotions is essential in human interaction. Often, individuals convey emotions through neutral speech, while the underlying meaning carries emotional weight. Conversely, tone can also convey emotion despite neutral words. Most Speech Emotion Recognition research overlooks this. We address this gap with a multimodal emotion recognition system using hierarchical classifiers and a novel decision fusion method. Our approach analyses emotional cues from speech and text, measuring their impact on predicted classes, considering emotional or neutral contributions for each instance. Results on the IEMOCAP dataset show our method's effectiveness: 69.45% and 65.62% weighted accuracy in speaker-dependent and speaker -independent settings, respectively.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Status: | In Press |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Date of First Compliant Deposit: | 30 July 2024 |
Date of Acceptance: | 13 June 2024 |
Last Modified: | 08 Nov 2024 08:15 |
URI: | https://orca.cardiff.ac.uk/id/eprint/170045 |
Actions (repository staff only)
Edit Item |