Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

FlexiTerm: a flexible term recognition method

Spasic, Irena ORCID: https://orcid.org/0000-0002-8132-3885, Greenwood, Mark, Preece, Alun David ORCID: https://orcid.org/0000-0003-0349-9057, Francis, Nicholas Andrew ORCID: https://orcid.org/0000-0001-8939-7312 and Elwyn, Glyn ORCID: https://orcid.org/0000-0002-0917-6286 2013. FlexiTerm: a flexible term recognition method. Journal of Biomedical Semantics 4 , 27. 10.1186/2041-1480-4-27

[thumbnail of Spasic 2013.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

Background: The increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient blogs, the main obstacles being the use of non-standardised terminology and high degree of term variation. Results: In this paper, we describe FlexiTerm, a method for automatic term recognition from a domain-specific corpus, and evaluate its performance against five manually annotated corpora. FlexiTerm performs term recognition in two steps: linguistic filtering is used to select term candidates followed by calculation of termhood, a frequency-based measure used as evidence to qualify a candidate as a term. In order to improve the quality of termhood calculation, which may be affected by the term variation phenomena, FlexiTerm uses a range of methods to neutralise the main sources of variation in biomedical terms. It manages syntactic variation by processing candidates using a bag-of-words approach. Orthographic and morphological variations are dealt with using stemming in combination with lexical and phonetic similarity measures. The method was evaluated on five biomedical corpora. The highest values for precision (94.56%), recall (71.31%) and F-measure (81.31%) were achieved on a corpus of clinical notes. Conclusions: FlexiTerm is an open-source software tool for automatic term recognition. It incorporates a simple term variant normalisation method. The method proved to be more robust than the baseline against less formally structured texts, such as those found in patient blogs or medical notes. The software can be downloaded freely at http://www.cs.cf.ac.uk/flexiterm.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Medicine
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Publisher: Springer
ISSN: 2041-1480
Related URLs:
Date of First Compliant Deposit: 30 March 2016
Last Modified: 07 May 2023 12:39
URI: https://orca.cardiff.ac.uk/id/eprint/51976

Citation Data

Cited 33 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics