Williams, Lowri, Arribas-Ayllon, Michael ORCID: https://orcid.org/0000-0003-2669-2781, Artemiou, Andreas ORCID: https://orcid.org/0000-0002-7501-4090 and Spasic, Irena ORCID: https://orcid.org/0000-0002-8132-3885 2019. Comparing the utility of different classification schemes for emotive language analysis. Journal of Classification 36 (3) , pp. 619-648. 10.1007/s00357-019-9307-0 |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (2MB) | Preview |
Abstract
In this paper we investigated the utility of different classification schemes for emotive language analysis with the aim of providing experimental justification for the choice of scheme for classifying emotions in free text. We compared six schemes: (1) Ekman's six basic emotions, (2) Plutchik's wheel of emotion, (3) Watson and Tellegen's Circumplex theory of affect, (4) the Emotion Annotation Representation Language (EARL), (5) WordNet–Affect, and (6) free text. To measure their utility, we investigated their ease of use by human annotators as well as the performance of supervised machine learning. We assembled a corpus of 500 emotionally charged text documents. The corpus was annotated manually using an online crowdsourcing platform with five independent annotators per document. Assuming that classification schemes with a better balance between completeness and complexity are easier to interpret and use, we expect such schemes to be associated with higher inter–annotator agreement. We used Krippendorff's alpha coefficient to measure inter–annotator agreement according to which the six classification schemes were ranked as follows: (1) six basic emotions (a = 0.483), (2) wheel of emotion (a = 0.410), (3) Circumplex (a = 0.312), EARL (a = 0.286), (5) free text (a = 0.205), and (6) WordNet–Affect (a = 0.202). However, correspondence analysis of annotations across the schemes highlighted that basic emotions are oversimplified representations of complex phenomena and as such likely to lead to invalid interpretations, which are not necessarily reflected by high inter-annotator agreement. To complement the result of the quantitative analysis, we used semi–structured interviews to gain a qualitative insight into how annotators interacted with and interpreted the chosen schemes. The size of the classification scheme was highlighted as a significant factor affecting annotation. In particular, the scheme of six basic emotions was perceived as having insufficient coverage of the emotion space forcing annotators to often resort to inferior alternatives, e.g. using happiness as a surrogate for love. On the opposite end of the spectrum, large schemes such as WordNet–Affect were linked to choice fatigue, which incurred significant cognitive effort in choosing the best annotation. In the second part of the study, we used the annotated corpus to create six training datasets, one for each scheme. The training data were used in cross–validation experiments to evaluate classification performance in relation to different schemes. According to the F-measure, the classification schemes were ranked as follows: (1) six basic emotions (F = 0.410), (2) Circumplex (F = 0.341), (3) wheel of emotion (F = 0.293), (4) EARL (F = 0.254), (5) free text (F = 0.159) and (6) WordNet–Affect (F = 0.158). Not surprisingly, the smallest scheme was ranked the highest in both criteria. Therefore, out of the six schemes studied here, six basic emotions are best suited for emotive language analysis. However, both quantitative and qualitative analysis highlighted its major shortcoming – oversimplification of positive emotions, which are all conflated into happiness. Further investigation is needed into ways of better balancing positive and negative emotions. Keywords: annotation, crowdsourcing, text classification, sentiment analysis, supervised machine learning
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Mathematics Social Sciences (Includes Criminology and Education) Computer Science & Informatics Data Innovation Research Institute (DIURI) |
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Uncontrolled Keywords: | annotation, crowdsourcing, text classification, sentiment analysis, supervised machine learning |
Publisher: | Springer Verlag |
ISSN: | 0176-4268 |
Funders: | EPSRC |
Date of First Compliant Deposit: | 28 January 2019 |
Date of Acceptance: | 5 January 2019 |
Last Modified: | 05 May 2023 17:42 |
URI: | https://orca.cardiff.ac.uk/id/eprint/118835 |
Citation Data
Cited 4 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |