Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

NASARI: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities

Camacho Collados, Jose ORCID:, Pilehvar, Mohammad Taher and Navigli, Roberto 2016. NASARI: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence 240 , pp. 36-64. 10.1016/j.artint.2016.07.005

[thumbnail of NASARI_AIJ.pdf]
PDF - Accepted Post-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (903kB) | Preview


Owing to the need for a deep understanding of linguistic items, semantic representation is considered to be one of the fundamental components of several applications in Natural Language Processing and Artificial Intelligence. As a result, semantic representation has been one of the prominent research areas in lexical semantics over the past decades. However, due mainly to the lack of large sense-annotated corpora, most existing representation techniques are limited to the lexical level and thus cannot be effectively applied to individual word senses. In this paper we put forward a novel multilingual vector representation, called Nasari, which not only enables accurate representation of word senses in different languages, but it also provides two main advantages over existing approaches: (1) high coverage, including both concepts and named entities, (2) comparability across languages and linguistic levels (i.e., words, senses and concepts), thanks to the representation of linguistic items in a single unified semantic space and in a joint embedded space, respectively. Moreover, our representations are flexible, can be applied to multiple applications and are freely available at As evaluation benchmark, we opted for four different tasks, namely, word similarity, sense clustering, domain labeling, and Word Sense Disambiguation, for each of which we report state-of-the-art performance on several standard datasets across different languages.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Uncontrolled Keywords: Semantic representation, Lexical semantics, Word Sense, Disambiguation,Semantic similarity, Sense clustering Domain labeling
Publisher: Elsevier
ISSN: 0004-3702
Date of First Compliant Deposit: 11 July 2018
Date of Acceptance: 25 July 2016
Last Modified: 06 Nov 2023 21:25

Citation Data

Cited 132 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics