Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Using semi-structured data for assessing research paper similarity

Martin, German Hurtado, Schockaert, Steven ORCID:, Cornelis, Chris and Naessens, Helga 2013. Using semi-structured data for assessing research paper similarity. Information Sciences 221 , pp. 245-261. 10.1016/j.ins.2012.09.044

Full text not available from this repository.


The task of assessing the similarity of research papers is of interest in a variety of application contexts. It is a challenging task, however, as the full text of the papers is often not available, and similarity needs to be determined based on the papers’ abstract, and some additional features such as their authors, keywords, and the journals in which they were published. Our work explores several methods to exploit this information, first by using methods based on the vector space model and then by adapting language modeling techniques to this end. In the first case, in addition to a number of standard approaches we experiment with the use of a form of explicit semantic analysis. In the second case, the basic strategy we pursue is to augment the information contained in the abstract by interpolating the corresponding language model with language models for the authors, keywords and journal of the paper. This strategy is then extended by revealing the latent topic structure of the collection using an adaptation of Latent Dirichlet Allocation, in which the keywords that were provided by the authors are used to guide the process. Experimental analysis shows that a well-considered use of these techniques significantly improves the results of the standard vector space model approach.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
ISSN: 0020-0255
Last Modified: 25 Oct 2022 09:42

Citation Data

Cited 20 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item