Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Learning cross-lingual word embeddings from Twitter via distant supervision

Camacho Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239, Doval, Yerai, Martínez-Cámara, Eugenio, Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176, Barbieri, Francesco and Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881 2020. Learning cross-lingual word embeddings from Twitter via distant supervision. Proceedings of the International AAAI Conference on Web and Social Media 14 (1) , pp. 72-82.

[thumbnail of Accepted to ICWSM 2020]
Preview
PDF (Accepted to ICWSM 2020) - Accepted Post-Print Version
Download (249kB) | Preview

Abstract

Cross-lingual embeddings represent the meaning of words from different languages in the same vector space. Recent work has shown that it is possible to construct such representations by aligning independently learned monolingual embedding spaces, and that accurate alignments can be obtained even without external bilingual data. In this paper we explore a research direction that has been surprisingly neglected in the literature: leveraging noisy user-generated text to learn cross-lingual embeddings particularly tailored towards social media applications. While the noisiness and informal nature of the social media genre poses additional challenges to cross-lingual embedding methods, we find that it also provides key opportunities due to the abundance of code-switching and the existence of a shared vocabulary of emoji and named entities. Our contribution consists of a very simple post-processing step that exploits these phenomena to significantly improve the performance of state-of-the-art alignment methods.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: Association for the Advancement of Artificial Intelligence
ISSN: 2162-3449
Date of First Compliant Deposit: 9 December 2019
Date of Acceptance: 15 November 2019
Last Modified: 17 Nov 2024 13:30
URI: https://orca.cardiff.ac.uk/id/eprint/127435

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics