Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Back to the basics: a quantitative analysis of statistical and graph-based term weighting schemes for keyword extraction

Camacho Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239, Liberatore, Federico ORCID: https://orcid.org/0000-0001-9900-5108 and Ushio, Asahi 2021. Back to the basics: a quantitative analysis of statistical and graph-based term weighting schemes for keyword extraction. Presented at: EMNLP 2021 Conference, online and at Punta Cana, Dominican Republic, 7-11 November 2021. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 8089-8103.

[thumbnail of EMNLP_21_Keyword_Extraction-2_J Camacho-Collados.pdf] PDF - Accepted Post-Print Version
Available under License Creative Commons Attribution.

Download (372kB)

Abstract

Term weighting schemes are widely used in Natural Language Processing and Information Retrieval. In particular, term weighting is the basis for keyword extraction. However, there are relatively few evaluation studies that shed light about the strengths and shortcomings of each weighting scheme. In fact, in most cases researchers and practitioners resort to the well-known tf-idf as default, despite the existence of other suitable alternatives, including graph-based models. In this paper, we perform an exhaustive and large-scale empirical comparison of both statistical and graph-based term weighting methods in the context of keyword extraction. Our analysis reveals some interesting findings such as the advantages of the less-known lexical specificity with respect to tf-idf, or the qualitative differences between statistical and graph-based methods. Finally, based on our findings we discuss and devise some suggestions for practitioners

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: Association for Computational Linguistics
Date of First Compliant Deposit: 27 September 2021
Date of Acceptance: 26 August 2021
Last Modified: 29 Nov 2022 09:45
URI: https://orca.cardiff.ac.uk/id/eprint/144472

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics