Spasic, Irena ORCID: https://orcid.org/0000-0002-8132-3885, Corcoran, Padraig ORCID: https://orcid.org/0000-0001-9731-3385, Gagarin, Andrei ORCID: https://orcid.org/0000-0001-9749-9706 and Buerki, Andreas ORCID: https://orcid.org/0000-0003-2151-3246 2018. Head to head: Semantic similarity of multi-word terms. IEEE Access 6 , pp. 20545-20557. 10.1109/ACCESS.2018.2826224 |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (8MB) | Preview |
Abstract
Terms are linguistic signifiers of domain–specific concepts. Semantic similarity between terms refers to the corresponding distance in the conceptual space. In this study, we use lexico–syntactic information to define a vector space representation in which cosine similarity closely approximates semantic similarity between the corresponding terms. Given a multi–word term, each word is weighed in terms of its defining properties. In this context, the head noun is given the highest weight. Other words are weighed depending on their relations to the head noun. We formalized the problem as that of determining a topological ordering of a direct acyclic graph, which is based on constituency and dependency relations within a noun phrase. To counteract the errors associated with automatically inferred constituency and dependency relations, we implemented a heuristic approach to approximating the topological ordering. Different weights are assigned to different words based on their positions. Clustering experiments performed on such a vector space representation showed considerable improvement over the conventional bag–of–word representation. Specifically, it more consistently reflected semantic similarity between the terms. This was established by analyzing the differences between automatically generated dendrograms and manually constructed taxonomies. In conclusion, our method can be used to semi–automate taxonomy construction.
Item Type: | Article |
---|---|
Date Type: | Published Online |
Status: | Published |
Schools: | Mathematics Computer Science & Informatics English, Communication and Philosophy |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software |
Additional Information: | This work is licensed under a Creative Commons Attribution 3.0 License. |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
ISSN: | 2169-3536 |
Date of First Compliant Deposit: | 11 April 2018 |
Date of Acceptance: | 9 April 2018 |
Last Modified: | 06 May 2023 00:11 |
URI: | https://orca.cardiff.ac.uk/id/eprint/110619 |
Citation Data
Cited 5 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |