Boisson, Joanne, Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176 and Camacho Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239 2023. Construction artifacts in metaphor identification datasets. Presented at: 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-10 December 2023. Published in: Bouamor, Houda, Pino, Juan and Bali, Kalika eds. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 6581–6590. 10.18653/v1/2023.emnlp-main.406 |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (152kB) |
Abstract
Metaphor identification aims at understanding whether a given expression is used figuratively in context. However, in this paper we show how existing metaphor identification datasets can be gamed by fully ignoring the potential metaphorical expression or the context in which it occurs. We test this hypothesis in a variety of datasets and settings, and show that metaphor identification systems based on language models without complete information can be competitive with those using the full context. This is due to the construction procedures to build such datasets, which introduce unwanted biases for positive and negative classes. Finally, we test the same hypothesis on datasets that are carefully sampled from natural corpora and where this bias is not present, making these datasets more challenging and reliable.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Advanced Research Computing @ Cardiff (ARCCA) Computer Science & Informatics |
Publisher: | Association for Computational Linguistics |
Date of First Compliant Deposit: | 12 June 2024 |
Last Modified: | 12 Jun 2024 12:33 |
URI: | https://orca.cardiff.ac.uk/id/eprint/168939 |
Actions (repository staff only)
Edit Item |