Capturing word order in averaging based sentence embeddings

Lee, Jae Hee

, Camacho Collados, Jose

, Espinosa-Anke, Luis

and Schockaert, Steven

2020. Capturing word order in averaging based sentence embeddings. Presented at: European Conference on Artificial Intelligence (ECAI2020), Santiago de Compostela, Spain, 29 August - 2 September 2020. Published in: De Giacomo, Guiseppe, Catala, Alejandro, Dilkina, Bistra, Milano, Michela, Barro, Senen, Bugarin, Alberto and Lang, Jerome eds. 24th European Conference on Artificial Intelligence. , vol.325 IOS Press, pp. 2062-2069. 10.3233/FAIA200328

Preview

PDF - Published Version
Available under License Creative Commons Attribution Non-commercial.
Download (415kB) | Preview

Official URL: https://doi.org/10.3233/FAIA200328

Abstract

One of the most remarkable findings in the literature on sentence embeddings has been that simple word vector averaging can compete with state-of-the-art models in many tasks. While counter-intuitive, a convincing explanation has been provided by Arora et al., who showed that the bag-of-words representation of a sentence can be recovered from its word vector average with almost perfect accuracy. Beyond word vector averaging, however, most sentence embedding models are essentially black boxes: while there is abundant empirical evidence about their strengths and weaknesses, it is not clear why and how different embedding strategies are able to capture particular properties of sentences. In this paper, we focus in particular on how sentence embedding models are able to capture word order. For instance, it seems intuitively puzzling that simple LSTM autoencoders are able to learn sentence vectors from which the original sentence can be reconstructed almost perfectly. With the aim of elucidating this phenomenon, we show that to capture word order, it is in fact sufficient to supplement standard word vector averages with averages of bigram and trigram vectors. To this end, we first study the problem of reconstructing bags-of-bigrams, focusing in particular on how suitable bigram vectors should be encoded. We then show that LSTMs are capable, in principle, of learning our proposed sentence embeddings. Empirically, we find that our embeddings outperform those learned by LSTM autoencoders on the task of sentence reconstruction, while needing almost no training data.

Item Type:	Conference or Workshop Item (Paper)
Date Type:	Publication
Status:	Published
Schools:	Schools > Computer Science & Informatics
Publisher:	IOS Press
ISBN:	9781643681009
Date of First Compliant Deposit:	14 April 2020
Date of Acceptance:	14 January 2020
Last Modified:	13 Nov 2025 12:20
URI:	https://orca.cardiff.ac.uk/id/eprint/130910

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)