NASARI: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities

, Pilehvar, Mohammad Taher and Navigli, Roberto 2016. NASARI: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence 240 , pp. 36-64. 10.1016/j.artint.2016.07.005

Preview

PDF - Accepted Post-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (903kB) | Preview

Official URL: https://doi.org/10.1016/j.artint.2016.07.005

Abstract

Owing to the need for a deep understanding of linguistic items, semantic representation is considered to be one of the fundamental components of several applications in Natural Language Processing and Artificial Intelligence. As a result, semantic representation has been one of the prominent research areas in lexical semantics over the past decades. However, due mainly to the lack of large sense-annotated corpora, most existing representation techniques are limited to the lexical level and thus cannot be effectively applied to individual word senses. In this paper we put forward a novel multilingual vector representation, called Nasari, which not only enables accurate representation of word senses in different languages, but it also provides two main advantages over existing approaches: (1) high coverage, including both concepts and named entities, (2) comparability across languages and linguistic levels (i.e., words, senses and concepts), thanks to the representation of linguistic items in a single unified semantic space and in a joint embedded space, respectively. Moreover, our representations are flexible, can be applied to multiple applications and are freely available at http://lcl.uniroma1.it/nasari/. As evaluation benchmark, we opted for four different tasks, namely, word similarity, sense clustering, domain labeling, and Word Sense Disambiguation, for each of which we report state-of-the-art performance on several standard datasets across different languages.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Computer Science & Informatics
Uncontrolled Keywords:	Semantic representation, Lexical semantics, Word Sense, Disambiguation,Semantic similarity, Sense clustering Domain labeling
Publisher:	Elsevier
ISSN:	0004-3702
Date of First Compliant Deposit:	11 July 2018
Date of Acceptance:	25 July 2016
Last Modified:	02 Dec 2024 13:00
URI:	https://orca.cardiff.ac.uk/id/eprint/113132

Citation Data

Cited 132 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item

Altmetric

Dimensions

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)