Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Pushing the envelope of sentiment analysis beyond words and polarities

Williams, Lowri 2017. Pushing the envelope of sentiment analysis beyond words and polarities. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of Lowri Williams - thesis - original version.pdf]
Preview
PDF - Accepted Post-Print Version
Download (5MB) | Preview
[thumbnail of image.pdf] PDF - Supplemental Material
Restricted to Repository staff only

Download (2MB)

Abstract

Idioms are multi-word expressions which hold a literal and figurative meaning which is conventionally understood by native speakers. Their overall meaning, often, cannot be deduced from the literal meaning of their constituent words. Sentiment analysis, also referred to as opinion mining, aims to automatically extract and classify sentiments, opinions, and emotions expressed in text. The research in this thesis is motivated by the fact that idioms, which often express an affective stance towards an entity or an event, are not featured systematically in sentiment analysis. To estimate the degree to which the inclusion of idioms as features may improve the results of traditional sentiment analysis, we compared our results to two state-of-the-art sentiment analysis approaches. Firstly, we collected a set of idioms that are relevant to sentiment analysis, i.e. those that can be mapped to an emotion. These mappings were obtained using a crowdsourcing approach. Secondly, to evaluate the results of sentiment analysis, we assembled a corpus of sentences in which idioms are used in context. Each sentence was annotated with an emotion, which formed the basis for the gold standard used for the comparison against the baseline methods. The classification performance was improved by almost 20 percentage points. Given the positive findings from our initial experiments, the main limitation was the significant knowledge-engineering overhead involved in hand-crafting lexico-semantic resources used to support idiom-based features. To minimise the bottleneck associated with the acquisition of such resources, we scaled up our original approach by automating their engineering. Subsequently, these resources were used to replace the manually engineered counterparts of such features in the originally proposed method. The fully automated approach outperformed the two baseline methods by 7 and 9 percentage points. These improvements, however, were poorer in comparison to those achieved in the initial study. Nevertheless, we have demonstrated, not only can idiom-based features be automatically engineered, but they too, improve sentiment classification results, when such features are present. Taking a long-term view of the research in this thesis, we want to address the limitations of state-of-the-art sentiment analysis approaches by focusing on a full range of emotions, rather than sentiment polarity. However, there is no consensus among researchers on a standardised framework for classifying emotions. Proposing such a framework would be a major contribution to the field of sentiment analysis, as it would stimulate its evolution into fully-fledged emotion classification and allow for systematic comparison of independent studies. With this goal in mind, we investigated the utility of different classification frameworks for sentiment analysis. A comprehensive statistical analysis of our experimental results provided explicit evidence that, in relative terms, six basic emotions are best suited for sentiment analysis. However, we identified the major shortcoming of oversimplifying positive emotions.

Item Type: Thesis (PhD)
Date Type: Completion
Status: Unpublished
Schools: Computer Science & Informatics
Funders: EPSRC
Date of First Compliant Deposit: 27 March 2018
Last Modified: 20 May 2021 14:52
URI: https://orca.cardiff.ac.uk/id/eprint/110268

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics