Automatic detection of patronizing and condescending language towards vulnerable communities

Perez Almendros, Carla 2023. Automatic detection of patronizing and condescending language towards vulnerable communities. PhD Thesis, Cardiff University.

Item availability restricted.

Preview	PDF - Accepted Post-Print Version Available under License Creative Commons Attribution No Derivatives. Download (1MB) \| Preview
	PDF (Cardiff University Electronic Publication Form) - Supplemental Material Restricted to Repository staff only Download (131kB)

Abstract

This thesis is focused on the study and analysis of Patronizing and Condescending Language (PCL) towards vulnerable communities. Someone using PCL displays a superior attitude towards others, raising a feeling of compassion and pity. PCL feeds stereotypes, causes discrimination and reinforces inequalities. In this work, we analyze how NLP can help us to detect and categorize PCL, while enhancing human understanding of such language. To achieve this, we introduce a novel task to the NLP community, namely the Detection and Categorization of PCL towards vulnerable communities. This thesis contributes valuable insights by providing annotated data, baselines, and qualitative analysis from various experiments. The work developed in this thesis started with the creation of the Don’t Patronize Me! (DPM!) dataset, with paragraphs extracted from media sources. Each paragraph was annotated to identify PCL and the specific techniques employed to express the condescension. A taxonomy of PCL categories was also introduced to classify these techniques. We analyzed the effectiveness of language models in detecting and categorizing PCL, showing that non-trivial results can be achieved, but room for improvement remains. We furthermore explored the impact of prior knowledge through transfer learning, revealing that exposure to certain types of data can benefit PCL detection models. Additionally, we share insights gained from organizing a SemEval task focused on PCL detection, which demonstrated that a judicious combination of standard models and SoTA techniques can achieve remarkable results. However, a closer look at the dataset unveiled that there are two types of PCL, namely linguistic and thematic, and that the training data significantly influences the model’s ability to detect specific PCL types. Overall, our findings confirm that language models can detect and categorize PCL to some extent, but specific approaches tailored to its unique characteristics are necessary. These findings improve our understanding of PCL and offer directions for future research.

Item Type:	Thesis (PhD)
Date Type:	Completion
Status:	Unpublished
Schools:	Schools > Computer Science & Informatics
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software
Date of First Compliant Deposit:	16 January 2024
Date of Acceptance:	8 January 2024
Last Modified:	17 Jan 2024 10:11
URI:	https://orca.cardiff.ac.uk/id/eprint/165503

Actions (repository staff only)

Edit Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)