Perez Almendros, Carla
2023.
Automatic detection of patronizing and condescending language
towards vulnerable communities.
PhD Thesis,
Cardiff University.
Item availability restricted. |
Preview |
PDF
- Accepted Post-Print Version
Available under License Creative Commons Attribution No Derivatives. Download (1MB) | Preview |
PDF (Cardiff University Electronic Publication Form)
- Supplemental Material
Restricted to Repository staff only Download (131kB) |
Abstract
This thesis is focused on the study and analysis of Patronizing and Condescending Language (PCL) towards vulnerable communities. Someone using PCL displays a superior attitude towards others, raising a feeling of compassion and pity. PCL feeds stereotypes, causes discrimination and reinforces inequalities. In this work, we analyze how NLP can help us to detect and categorize PCL, while enhancing human understanding of such language. To achieve this, we introduce a novel task to the NLP community, namely the Detection and Categorization of PCL towards vulnerable communities. This thesis contributes valuable insights by providing annotated data, baselines, and qualitative analysis from various experiments. The work developed in this thesis started with the creation of the Don’t Patronize Me! (DPM!) dataset, with paragraphs extracted from media sources. Each paragraph was annotated to identify PCL and the specific techniques employed to express the condescension. A taxonomy of PCL categories was also introduced to classify these techniques. We analyzed the effectiveness of language models in detecting and categorizing PCL, showing that non-trivial results can be achieved, but room for improvement remains. We furthermore explored the impact of prior knowledge through transfer learning, revealing that exposure to certain types of data can benefit PCL detection models. Additionally, we share insights gained from organizing a SemEval task focused on PCL detection, which demonstrated that a judicious combination of standard models and SoTA techniques can achieve remarkable results. However, a closer look at the dataset unveiled that there are two types of PCL, namely linguistic and thematic, and that the training data significantly influences the model’s ability to detect specific PCL types. Overall, our findings confirm that language models can detect and categorize PCL to some extent, but specific approaches tailored to its unique characteristics are necessary. These findings improve our understanding of PCL and offer directions for future research.
Item Type: | Thesis (PhD) |
---|---|
Date Type: | Completion |
Status: | Unpublished |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software |
Date of First Compliant Deposit: | 16 January 2024 |
Date of Acceptance: | 8 January 2024 |
Last Modified: | 17 Jan 2024 10:11 |
URI: | https://orca.cardiff.ac.uk/id/eprint/165503 |
Actions (repository staff only)
Edit Item |