A framework for the construction of monolingual and cross-lingual word similarity datasets

, Pilehvar, Mohammad Taher and Navigli, Roberto 2015. A framework for the construction of monolingual and cross-lingual word similarity datasets. Presented at: ACL-IJCNLP 2015: 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26-31 July 2015. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, pp. 1-7.

Preview

PDF - Published Version
Download (766kB) | Preview

Official URL: http://aclweb.org/anthology/P/P15/P15-2001.pdf

Abstract

Despite being one of the most popular tasks in lexical semantics, word similarity has often been limited to the English language. Other languages, even those that are widely spoken such as Spanish, do not have a reliable word similarity evaluation framework. We put forward robust methodologies for the extension of existing English datasets to other languages, both at monolingual and cross-lingual levels. We propose an automatic standardization for the construction of cross-lingual similarity datasets, and provide an evaluation, demonstrating its reliability and robustness. Based on our procedure and taking the RG-65 word similarity dataset as a reference, we release two high-quality Spanish and Farsi (Persian) monolingual datasets, and fifteen cross-lingual datasets for six languages: English, Spanish, French, German, Portuguese, and Farsi.

Item Type:	Conference or Workshop Item - published (Paper)
Date Type:	Completion
Status:	Published
Schools:	Schools > Computer Science & Informatics
Publisher:	Association for Computational Linguistics
Date of First Compliant Deposit:	18 July 2018
Last Modified:	23 Oct 2022 14:13
URI:	https://orca.cardiff.ac.uk/id/eprint/113085

Actions (repository staff only)

Edit Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)