Perez-Almendros, Carla, Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176 and Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881 2022. Pre-training language models for identifying patronizing and condescending language: an analysis. Presented at: LREC 2022, 20-25 June 2022. Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022). European Language Resources Association, pp. 3902-3911. |
PDF
- Published Version
Available under License Creative Commons Attribution Non-commercial. Download (207kB) |
Abstract
Patronizing and Condescending Language (PCL) is a subtle but harmful type of discourse, yet the task of recognizing PCL remains under-studied by the NLP community. Recognizing PCL is challenging because of its subtle nature, because available datasets are limited in size, and because this task often relies on some form of commonsense knowledge. In this paper, we study to what extent PCL detection models can be improved by pre-training them on other, more established NLP tasks. We find that performance gains are indeed possible in this way, in particular when pre-training on tasks focusing on sentiment, harmful language and commonsense morality. In contrast, for tasks focusing on political speech and social justice, no or only very small improvements were witnessed. These findings improve our understanding of the nature of PCL.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Status: | In Press |
Schools: | Advanced Research Computing @ Cardiff (ARCCA) Computer Science & Informatics |
Additional Information: | © European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0 |
Publisher: | European Language Resources Association |
Date of First Compliant Deposit: | 25 May 2022 |
Date of Acceptance: | 4 April 2022 |
Last Modified: | 14 Jun 2024 15:21 |
URI: | https://orca.cardiff.ac.uk/id/eprint/150046 |
Actions (repository staff only)
Edit Item |