Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Head to head: Semantic similarity of multi-word terms

Spasic, Irena ORCID: https://orcid.org/0000-0002-8132-3885, Corcoran, Padraig ORCID: https://orcid.org/0000-0001-9731-3385, Gagarin, Andrei ORCID: https://orcid.org/0000-0001-9749-9706 and Buerki, Andreas ORCID: https://orcid.org/0000-0003-2151-3246 2018. Head to head: Semantic similarity of multi-word terms. IEEE Access 6 , pp. 20545-20557. 10.1109/ACCESS.2018.2826224

[thumbnail of 08336868.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (8MB) | Preview

Abstract

Terms are linguistic signifiers of domain–specific concepts. Semantic similarity between terms refers to the corresponding distance in the conceptual space. In this study, we use lexico–syntactic information to define a vector space representation in which cosine similarity closely approximates semantic similarity between the corresponding terms. Given a multi–word term, each word is weighed in terms of its defining properties. In this context, the head noun is given the highest weight. Other words are weighed depending on their relations to the head noun. We formalized the problem as that of determining a topological ordering of a direct acyclic graph, which is based on constituency and dependency relations within a noun phrase. To counteract the errors associated with automatically inferred constituency and dependency relations, we implemented a heuristic approach to approximating the topological ordering. Different weights are assigned to different words based on their positions. Clustering experiments performed on such a vector space representation showed considerable improvement over the conventional bag–of–word representation. Specifically, it more consistently reflected semantic similarity between the terms. This was established by analyzing the differences between automatically generated dendrograms and manually constructed taxonomies. In conclusion, our method can be used to semi–automate taxonomy construction.

Item Type: Article
Date Type: Published Online
Status: Published
Schools: Mathematics
Computer Science & Informatics
English, Communication and Philosophy
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Additional Information: This work is licensed under a Creative Commons Attribution 3.0 License.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
ISSN: 2169-3536
Date of First Compliant Deposit: 11 April 2018
Date of Acceptance: 9 April 2018
Last Modified: 06 May 2023 00:11
URI: https://orca.cardiff.ac.uk/id/eprint/110619

Citation Data

Cited 5 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics