Wang, Yixiao
2023.
Distilling word vectors from contextualised language models.
PhD Thesis,
Cardiff University.
Item availability restricted. |
Preview |
PDF
- Accepted Post-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (2MB) | Preview |
PDF (Cardiff University Electronic Publication Form)
- Supplemental Material
Restricted to Repository staff only Download (772kB) |
Abstract
Although contextualised language models (CLMs) have reduced the need for word embedding in various NLP tasks, static representations of word meaning remain crucial in tasks where words have to be encoded without context. Such tasks arise in domains such as information retrieval. Compared to learning static word embeddings from scratch, distilling such representations from CLMs has advantages in downstream tasks[68],[2]. Usually, the embedding of a word w is distilled by feeding random sentences that mention w to a CLM and extracting the parameters. In this research, we assume distilling word embeddings from CLMs can be improved by feeding more informative mentions to a CLM. Therefore, as a first contribution in this thesis, we proposed a strategy for sentence selection by using a topic model. Since distilling high-quality word embeddings from CLMs requires many mentions for each word, we investigate whether we can obtain decent word embeddings by using a few but carefully selected mentions of each word. As our second contribution, we explored a range of sentence selection strategies and tested their generated word embeddings on various evaluation tasks. We found that 20 informative sentences per word are sufficient to obtain competitive word embeddings, especially when the sentences are selected by our proposed strategies. Besides improving the sentence selection strategy, as our third contribution, we also studied other strategies for obtaining word embeddings. We found that SBERT embeddings capture an aspect of word meaning that is highly complementary to the mention embeddings we previously focused on. Therefore, we proposed combining the vectors generated from these two methods through a contrastive learning model. The results confirm that combining these vectors leads to more informative word embeddings. In conclusion, this thesis shows that better static word embeddings can be efficiently distilled from CLMs by strategically selecting sentences and combining complementary methods
Item Type: | Thesis (PhD) |
---|---|
Date Type: | Completion |
Status: | Unpublished |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Funders: | School of Computer Science and Informatics 2 and 1/4 year stipend |
Date of First Compliant Deposit: | 25 October 2023 |
Last Modified: | 25 Oct 2023 09:18 |
URI: | https://orca.cardiff.ac.uk/id/eprint/163139 |
Actions (repository staff only)
Edit Item |