Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

WordNet under scrutiny: Dictionary examples in the era of large language models

Almeman, Fatemh, Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881 and Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176 2024. WordNet under scrutiny: Dictionary examples in the era of large language models. Presented at: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 20-24 May 2024.

[thumbnail of LREC_COLING_WN_evaluation.pdf]
Preview
PDF - Accepted Post-Print Version
Download (512kB) | Preview

Abstract

Dictionary definitions play a prominent role in a wide range of NLP tasks, for instance by providing additional context about the meaning of rare and emerging terms. Many dictionaries also provide examples to illustrate the prototypical usage of words, which brings further opportunities for training or enriching NLP models. The intrinsic qualities of dictionaries, and related lexical resources such as glossaries and encyclopedias, are however still not well-understood. While there has been significant work on developing best practices, such guidance has been aimed at traditional usages of dictionaries (e.g. supporting language learners), and it is currently unclear how different quality aspects affect the NLP systems that rely on them. To address this issue, we compare WordNet, the most commonly used lexical resource in NLP, with a variety of dictionaries, as well as with examples that were generated by ChatGPT. Our analysis involves human judgments as well as automatic metrics. We furthermore study the quality of word embeddings derived from dictionary examples, as a proxy for downstream performance. We find that WordNet’s examples lead to lower-quality embeddings than those from the Oxford dictionary. Surprisingly, however, the ChatGPT generated examples were found to be most effective overall.

Item Type: Conference or Workshop Item (Paper)
Status: In Press
Schools: Computer Science & Informatics
Date of First Compliant Deposit: 8 May 2024
Date of Acceptance: 20 February 2024
Last Modified: 08 May 2024 14:30
URI: https://orca.cardiff.ac.uk/id/eprint/168188

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics