Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

3D-EX: A unified dataset of definitions and dictionary examples

Almeman, Fatemah, Sheikhi, Hadi and Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176 2023. 3D-EX: A unified dataset of definitions and dictionary examples. Presented at: R A N L P 2 0 2 3 International conference recent advances in natural language processing, 4-6 September 2023. Proceedings of Recent Advances in Natural Language Processing. Shoumen, Bulgaria: INCOMA Ltd, pp. 69-79. 10.26615/978-954-452-092-2_008

[thumbnail of 2023.ranlp-1.8.pdf]
Preview
PDF - Published Version
Download (464kB) | Preview

Abstract

Definitions are a fundamental building block in lexicography, linguistics and computational semantics. In NLP, they have been used for retrofitting word embeddings or augmenting contextual representations in language models. However, lexical resources containing definitions exhibit a wide range of properties, which has implications in the behaviour of models trained and evaluated on them. In this paper, we introduce 3D-EX, a dataset that aims to fill this gap by combining well-known English resources into one centralized knowledge repository in the form of triples. 3D-EX is a unified evaluation framework with carefully pre-computed train/validation/test splits to prevent memorization. We report experimental results that suggest that this dataset could be effectively leveraged in downstream NLP tasks. Code and data are available at https://github.com/ F-Almeman/3D-EX.

Item Type: Conference or Workshop Item (Paper)
Status: Published
Schools: Computer Science & Informatics
Publisher: INCOMA Ltd
ISBN: 978-954-452-092-2
Date of First Compliant Deposit: 11 March 2024
Date of Acceptance: 30 June 2023
Last Modified: 22 Apr 2024 01:30
URI: https://orca.cardiff.ac.uk/id/eprint/167099

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics