Almeman, Fatemh, Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881 and Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176 2024. WordNet under scrutiny: Dictionary examples in the era of large language models. Presented at: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 20-24 May 2024. |
Preview |
PDF
- Accepted Post-Print Version
Download (512kB) | Preview |
Abstract
Dictionary definitions play a prominent role in a wide range of NLP tasks, for instance by providing additional context about the meaning of rare and emerging terms. Many dictionaries also provide examples to illustrate the prototypical usage of words, which brings further opportunities for training or enriching NLP models. The intrinsic qualities of dictionaries, and related lexical resources such as glossaries and encyclopedias, are however still not well-understood. While there has been significant work on developing best practices, such guidance has been aimed at traditional usages of dictionaries (e.g. supporting language learners), and it is currently unclear how different quality aspects affect the NLP systems that rely on them. To address this issue, we compare WordNet, the most commonly used lexical resource in NLP, with a variety of dictionaries, as well as with examples that were generated by ChatGPT. Our analysis involves human judgments as well as automatic metrics. We furthermore study the quality of word embeddings derived from dictionary examples, as a proxy for downstream performance. We find that WordNet’s examples lead to lower-quality embeddings than those from the Oxford dictionary. Surprisingly, however, the ChatGPT generated examples were found to be most effective overall.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Status: | In Press |
Schools: | Computer Science & Informatics |
Date of First Compliant Deposit: | 8 May 2024 |
Date of Acceptance: | 20 February 2024 |
Last Modified: | 08 May 2024 14:30 |
URI: | https://orca.cardiff.ac.uk/id/eprint/168188 |
Actions (repository staff only)
Edit Item |