Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Sentiment analysis in health and wellbeing: A systematic review

Zunic, Anastazia, Corcoran, Padraig ORCID: and Spasic, Irena ORCID: 2020. Sentiment analysis in health and wellbeing: A systematic review. JMIR Medical Informatics 8 (1) , e16023. 10.2196/16023

Full text not available from this repository.


Background: Sentiment analysis (SA) is a subfield of natural language processing whose aim is to automatically classify the sentiment expressed in a free text. It has found practical applications across a wide range of societal contexts including marketing, economy and politics. This review focuses specifically on applications related to health, which is defined as 'a state of complete physical, mental and social wellbeing and not merely the absence of disease or infirmity'. Objective: The main aim of this study is to establish the state of the art in SA related to health and wellbeing by conducting a systematic review of the recent literature. To capture the perspective of those individuals whose health and wellbeing are affected, we focus specifically on spontaneously generated content and not necessarily that of healthcare professionals. Methods: Our methodology is based on the guidelines for performing systematic reviews. In January 2019, we used PubMed, a multi-faceted interface, to perform a literature search against MEDLINE. We identified a total of 86 relevant studies and extracted data about the datasets analyzed, discourse topics, data creators, downstream applications, algorithms used and their evaluation. Results: The majority of data was collected from social networking and online retailing platforms. The primary purpose of online conversations is to exchange information and provide social support online. These communities tend to form around health conditions with high severity and chronicity rates. Different treatments and services discussed include medications, vaccination, surgery, orthodontic services, individual physicians and healthcare services in general. We identified five roles with respect to health and wellbeing among the authors of the types of spontaneously generated narratives considered in this review: a sufferer, an addict, a patient, a carer and a suicide victim. Out of 86 studies considered, only four reported the demographics characteristics. A wide range of methods have been used to perform SA. Most common choices include support vector machines, naïve Bayesian learning, decision trees, logistic regression and adaptive boosting. In contrast to general trends in SA research, only one study used deep learning. The performance lags behind the state of the art achieved in other domains when measured by F-score, which was found to be below 60% on average. In the context of SA, the domain of health and wellbeing was found to be resource poor: few domain-specific corpora and lexica are shared publicly for research purposes. Conclusions: SA results in the area of health and wellbeing lad behind those in other domains. It is yet unclear if this is due to the intrinsic differences between the domains and their respective sublanguages, the size of training datasets, the lack of domain-specific sentiment lexica or the choice of algorithms.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Data Innovation Research Institute (DIURI)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Publisher: JMIR Publications
Date of Acceptance: 27 October 2019
Last Modified: 06 Jan 2024 02:29

Citation Data

Cited 46 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item