Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Knowledge-enhanced document embeddings for text classification

Sinoara, Roberta, Camacho Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239, Rossi, Rafael, Navigli, Roberto and Rezende, Solange 2019. Knowledge-enhanced document embeddings for text classification. Knowledge-Based Systems 163 , pp. 955-971. 10.1016/j.knosys.2018.10.026

[thumbnail of _Knowledge_enhanced_document_embeddings_for_text_classification.pdf]
Preview
PDF - Submitted Pre-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (837kB) | Preview

Abstract

Accurate semantic representation models are essential in text mining applications. For a successful application of the text mining process, the text representation adopted must keep the interesting patterns to be discovered. Although competitive results for automatic text classification may be achieved with traditional bag of words, such representation model cannot provide satisfactory classification performances on hard settings where richer text representations are required. In this paper, we present an approach to represent document collections based on embedded representations of words and word senses. We bring together the power of word sense disambiguation and the semantic richness of word- and word-sense embedded vectors to construct embedded representations of document collections. Our approach results in semantically enhanced and low-dimensional representations. We overcome the lack of interpretability of embedded vectors, which is a drawback of this kind of representation, with the use of word sense embedded vectors. Moreover, the experimental evaluation indicates that the use of the proposed representations provides stable classifiers with strong quantitative results, especially in semantically-complex classification scenarios.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: Elsevier
ISSN: 0950-7051
Date of First Compliant Deposit: 2 April 2020
Date of Acceptance: 14 October 2018
Last Modified: 05 Dec 2024 05:30
URI: https://orca.cardiff.ac.uk/id/eprint/130670

Citation Data

Cited 85 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics