Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Word and document embedding with vMF-mixture priors on context word vectors

Jameel, Shoaib and Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881 2019. Word and document embedding with vMF-mixture priors on context word vectors. Presented at: 57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, 28 July - 2 August 2019. Published in: Korhornen, Anna, Traum, David and Marquez, Lluis eds. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, pp. 3319-3328. 10.18653/v1/P19-1321

[thumbnail of ACL_2019___vMF_embeddings-4.pdf]
Preview
PDF - Accepted Post-Print Version
Download (235kB) | Preview

Abstract

Word embedding models typically learn two types of vectors: target word vectors and context word vectors. These vectors are normally learned such that they are predictive of some word co-occurrence statistic, but they are otherwise unconstrained. However, the words from a given language can be organized in various natural groupings, such as syntactic word classes (e.g. nouns, adjectives, verbs) and semantic themes (e.g. sports, politics, sentiment). Our hypothesis in this paper is that embedding models can be improved by explicitly imposing a cluster structure on the set of context word vectors. To this end, our model relies on the assumption that context word vectors are drawn from a mixture of von Mises-Fisher (vMF) distributions, where the parameters of this mixture distribution are jointly optimized with the word vectors. We show that this results in word vectors which are qualitatively different from those obtained with existing word embedding models. We furthermore show that our embedding model can also be used to learn high-quality document representations.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Schools > Computer Science & Informatics
Publisher: Association for Computational Linguistics
Date of First Compliant Deposit: 14 August 2019
Last Modified: 24 Sep 2025 11:45
URI: https://orca.cardiff.ac.uk/id/eprint/124035

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics