Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Construction artifacts in metaphor identification datasets

Boisson, Joanne, Espinosa-Anke, Luis ORCID: and Camacho Collados, Jose ORCID: 2023. Construction artifacts in metaphor identification datasets. Presented at: 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-10 December 2023. Published in: Bouamor, Houda, Pino, Juan and Bali, Kalika eds. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 6581–6590. 10.18653/v1/2023.emnlp-main.406

[thumbnail of 2023.emnlp-main.406.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (152kB)


Metaphor identification aims at understanding whether a given expression is used figuratively in context. However, in this paper we show how existing metaphor identification datasets can be gamed by fully ignoring the potential metaphorical expression or the context in which it occurs. We test this hypothesis in a variety of datasets and settings, and show that metaphor identification systems based on language models without complete information can be competitive with those using the full context. This is due to the construction procedures to build such datasets, which introduce unwanted biases for positive and negative classes. Finally, we test the same hypothesis on datasets that are carefully sampled from natural corpora and where this bias is not present, making these datasets more challenging and reliable.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Advanced Research Computing @ Cardiff (ARCCA)
Computer Science & Informatics
Publisher: Association for Computational Linguistics
Date of First Compliant Deposit: 12 June 2024
Last Modified: 12 Jun 2024 12:33

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics