Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

WIKITIDE: A Wikipedia-based timestamped definition pairs dataset

Borkakoty, Hsuvas and Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176 2023. WIKITIDE: A Wikipedia-based timestamped definition pairs dataset. Presented at: R A N L P 2 0 2 3 International conference recent advances in natural language processing, 4-6 September 2023. Proceedings of Recent Advances in Natural Language Processing. Shoumen, Bulgaria: INCOMA Ltd, pp. 207-216. 10.26615/978-954-452-092-2_023

[thumbnail of 2023.ranlp-1.23.pdf]
Preview
PDF - Published Version
Download (363kB) | Preview

Abstract

A fundamental challenge in the current NLP context, dominated by language models, comes from the inflexibility of current architectures to “learn” new information. While model-centric solutions like continual learning or parameter-efficient fine-tuning are available, the question still remains of how to reliably identify changes in language or in the world. In this paper, we propose WikiTiDe, a dataset derived from pairs of timestamped definitions extracted from Wikipedia. We argue that such resource can be helpful for accelerating diachronic NLP, specifically, for training models able to scan knowledge resources for core updates concerning a concept, an event, or a named entity. Our proposed end-to-end method is fully automatic, and leverages a bootstrapping algorithm for gradually creating a high-quality dataset. Our results suggest that bootstrapping the seed version of WikiTiDe leads to better fine-tuned models. We also leverage fine-tuned models in a number of downstream tasks, showing promising results with respect to competitive baselines.

Item Type: Conference or Workshop Item (Paper)
Status: Published
Schools: Computer Science & Informatics
Publisher: INCOMA Ltd
ISBN: 978-954-452-092-2
Date of First Compliant Deposit: 11 March 2024
Date of Acceptance: 30 June 2023
Last Modified: 22 Apr 2024 01:30
URI: https://orca.cardiff.ac.uk/id/eprint/167098

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics