Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Deriving word vectors from contextualized language models using topic-aware mention selection

Wang, Yixiao, Bouraoui, Zied, Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176 and Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881 2021. Deriving word vectors from contextualized language models using topic-aware mention selection. Presented at: 6th Workshop on Representation Learning for NLP (RepL4NLP 2021), Virtual / Bangkok, Thailand, 05 August 2021. RepL4NLP 2021 - 6th Workshop on Representation Learning for NLP, Proceedings of the Workshop. Association for Computational Linguistics, pp. 185-194.

[thumbnail of REP4NLP_Yixiao.pdf] PDF - Accepted Post-Print Version
Download (274kB)

Abstract

One of the long-standing challenges in lexical semantics consists in learning representations of words which reflect their semantic properties. The remarkable success of word embeddings for this purpose suggests that high-quality representations can be obtained by summarizing the sentence contexts of word mentions. In this paper, we propose a method for learning word representations that follows this basic strategy, but differs from standard word embeddings in two important ways. First, we take advantage of contextualized language models (CLMs) rather than bags of word vectors to encode contexts. Second, rather than learning a word vector directly, we use a topic model to partition the contexts in which words appear, and then learn different topic-specific vectors for each word. Finally, we use a task-specific supervision signal to make a soft selection of the resulting vectors. We show that this simple strategy leads to high-quality word vectors, which are more predictive of semantic properties than word embeddings and existing CLM-based strategies. © 2021 Association for Computational Linguistics.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Professional Services > Advanced Research Computing @ Cardiff (ARCCA)
Schools > Computer Science & Informatics
Publisher: Association for Computational Linguistics
ISBN: 978-195408572-5
Date of First Compliant Deposit: 2 July 2021
Date of Acceptance: 2 June 2021
Last Modified: 30 Jul 2025 13:41
URI: https://orca.cardiff.ac.uk/id/eprint/142267

Citation Data

Cited 1 time in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics