Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

On automatic text segmentation

Dadachev, Boris, Balinsky, Alexander ORCID: https://orcid.org/0000-0002-8151-4462 and Balinsky, Helen ORCID: https://orcid.org/0000-0002-8151-4462 2014. On automatic text segmentation. Presented at: DocEng 14: 2014 ACM Symposium on Document Engineering, Denver, CO, USA, 16-19 September 2014. Proceedings of the 2014 ACM Symposium on Document Engineering. Association for Computing Machinery, pp. 73-80. 10.1145/2644866.2644874

Full text not available from this repository.

Abstract

Automatic text segmentation, which is the task of breaking a text into topically-consistent segments, is a fundamental problem in Natural Language Processing, Document Classification and Information Retrieval. Text segmentation can significantly improve the performance of various text mining algorithms, by splitting heterogeneous documents into homogeneous fragments and thus facilitating subsequent processing. Applications range from screening of radio communication transcripts to document summarization, from automatic document classification to information visualization, from automatic filtering to security policy enforcement - all rely on, or can largely benefit from, automatic document segmentation. In this article, a novel approach for automatic text and data stream segmentation is presented and studied. The proposed automatic segmentation algorithm takes advantage of feature extraction and unusual behaviour detection algorithms developed in [4, 5]. It is entirely unsupervised and flexible to allow segmentation at different scales, such as short paragraphs and large sections. We also briefly review the most popular and important algorithms for automatic text segmentation and present detailed comparisons of our approach with several of those state-of-the-art algorithms.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Mathematics
Subjects: Q Science > QA Mathematics
Publisher: Association for Computing Machinery
ISBN: 9781450329491
Related URLs:
Last Modified: 31 Oct 2022 10:50
URI: https://orca.cardiff.ac.uk/id/eprint/86400

Citation Data

Cited 4 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item