ORCA
Online Research @ Cardiff

Clear Cookie - decide language by browser settings

Comparing the utility of different classification schemes for emotive language analysis

Williams, Lowri, Arribas-Ayllon, Michael

, Artemiou, Andreas

and Spasic, Irena

2019. Comparing the utility of different classification schemes for emotive language analysis. Journal of Classification 36 (3) , pp. 619-648. 10.1007/s00357-019-9307-0

[thumbnail of Williams2019_Article_ComparingTheUtilityOfDifferent.pdf]

Preview

PDF - Published Version
Available under License Creative Commons Attribution.
Download (2MB) | Preview

License URL: http://creativecommons.org/licenses/by/4.0

License Start date: 10 May 2019

Official URL: https://doi.org/10.1007/s00357-019-9307-0

Abstract

In this paper we investigated the utility of different classification schemes for emotive language analysis with the aim of providing experimental justification for the choice of scheme for classifying emotions in free text. We compared six schemes: (1) Ekman's six basic emotions, (2) Plutchik's wheel of emotion, (3) Watson and Tellegen's Circumplex theory of affect, (4) the Emotion Annotation Representation Language (EARL), (5) WordNet–Affect, and (6) free text. To measure their utility, we investigated their ease of use by human annotators as well as the performance of supervised machine learning. We assembled a corpus of 500 emotionally charged text documents. The corpus was annotated manually using an online crowdsourcing platform with five independent annotators per document. Assuming that classification schemes with a better balance between completeness and complexity are easier to interpret and use, we expect such schemes to be associated with higher inter–annotator agreement. We used Krippendorff's alpha coefficient to measure inter–annotator agreement according to which the six classification schemes were ranked as follows: (1) six basic emotions (a = 0.483), (2) wheel of emotion (a = 0.410), (3) Circumplex (a = 0.312), EARL (a = 0.286), (5) free text (a = 0.205), and (6) WordNet–Affect (a = 0.202). However, correspondence analysis of annotations across the schemes highlighted that basic emotions are oversimplified representations of complex phenomena and as such likely to lead to invalid interpretations, which are not necessarily reflected by high inter-annotator agreement. To complement the result of the quantitative analysis, we used semi–structured interviews to gain a qualitative insight into how annotators interacted with and interpreted the chosen schemes. The size of the classification scheme was highlighted as a significant factor affecting annotation. In particular, the scheme of six basic emotions was perceived as having insufficient coverage of the emotion space forcing annotators to often resort to inferior alternatives, e.g. using happiness as a surrogate for love. On the opposite end of the spectrum, large schemes such as WordNet–Affect were linked to choice fatigue, which incurred significant cognitive effort in choosing the best annotation. In the second part of the study, we used the annotated corpus to create six training datasets, one for each scheme. The training data were used in cross–validation experiments to evaluate classification performance in relation to different schemes. According to the F-measure, the classification schemes were ranked as follows: (1) six basic emotions (F = 0.410), (2) Circumplex (F = 0.341), (3) wheel of emotion (F = 0.293), (4) EARL (F = 0.254), (5) free text (F = 0.159) and (6) WordNet–Affect (F = 0.158). Not surprisingly, the smallest scheme was ranked the highest in both criteria. Therefore, out of the six schemes studied here, six basic emotions are best suited for emotive language analysis. However, both quantitative and qualitative analysis highlighted its major shortcoming – oversimplification of positive emotions, which are all conflated into happiness. Further investigation is needed into ways of better balancing positive and negative emotions. Keywords: annotation, crowdsourcing, text classification, sentiment analysis, supervised machine learning

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Mathematics Schools > Social Sciences (Includes Criminology and Education) Schools > Computer Science & Informatics Research Institutes & Centres > Data Innovation Research Institute (DIURI)
Subjects:	Q Science > QA Mathematics > QA76 Computer software
Uncontrolled Keywords:	annotation, crowdsourcing, text classification, sentiment analysis, supervised machine learning
Publisher:	Springer Verlag
ISSN:	0176-4268
Funders:	EPSRC
Projects:	1511905
Date of First Compliant Deposit:	28 January 2019
Date of Acceptance:	5 January 2019
Last Modified:	18 Jan 2025 22:17
URI:	https://orca.cardiff.ac.uk/id/eprint/118835

Citation Data

Cited 4 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)