Lee, Jae Hee ORCID: https://orcid.org/0000-0001-9840-780X, Camacho Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239, Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176 and Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881
2020.
Capturing word order in averaging based sentence embeddings.
Presented at: European Conference on Artificial Intelligence (ECAI2020),
Santiago de Compostela, Spain,
29 August - 2 September 2020.
Published in: De Giacomo, Guiseppe, Catala, Alejandro, Dilkina, Bistra, Milano, Michela, Barro, Senen, Bugarin, Alberto and Lang, Jerome eds.
24th European Conference on Artificial Intelligence.
, vol.325
IOS Press,
pp. 2062-2069.
10.3233/FAIA200328
|
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution Non-commercial. Download (415kB) | Preview |
Abstract
One of the most remarkable findings in the literature on sentence embeddings has been that simple word vector averaging can compete with state-of-the-art models in many tasks. While counter-intuitive, a convincing explanation has been provided by Arora et al., who showed that the bag-of-words representation of a sentence can be recovered from its word vector average with almost perfect accuracy. Beyond word vector averaging, however, most sentence embedding models are essentially black boxes: while there is abundant empirical evidence about their strengths and weaknesses, it is not clear why and how different embedding strategies are able to capture particular properties of sentences. In this paper, we focus in particular on how sentence embedding models are able to capture word order. For instance, it seems intuitively puzzling that simple LSTM autoencoders are able to learn sentence vectors from which the original sentence can be reconstructed almost perfectly. With the aim of elucidating this phenomenon, we show that to capture word order, it is in fact sufficient to supplement standard word vector averages with averages of bigram and trigram vectors. To this end, we first study the problem of reconstructing bags-of-bigrams, focusing in particular on how suitable bigram vectors should be encoded. We then show that LSTMs are capable, in principle, of learning our proposed sentence embeddings. Empirically, we find that our embeddings outperform those learned by LSTM autoencoders on the task of sentence reconstruction, while needing almost no training data.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Date Type: | Publication |
| Status: | Published |
| Schools: | Schools > Computer Science & Informatics |
| Publisher: | IOS Press |
| ISBN: | 9781643681009 |
| Date of First Compliant Deposit: | 14 April 2020 |
| Date of Acceptance: | 14 January 2020 |
| Last Modified: | 13 Nov 2025 12:20 |
| URI: | https://orca.cardiff.ac.uk/id/eprint/130910 |
Actions (repository staff only)
![]() |
Edit Item |





Dimensions
Dimensions