Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Rapid change detection and text mining

Balinsky, Alexander ORCID:, Balinsky, Helen and Simske, Steven 2011. Rapid change detection and text mining. Presented at: 2nd IMA Conference on Mathematics in Defence, Defence Academy of the United Kingdom, Swindon, 20 October 2011.

Full text not available from this repository.


In this presentation we review and present a novel approach to text data mining and automatic text summarization. This modeling includes several steps. First, we apply a rapid change detection algorithm in data streams and documents, introduced in [1, 2]. It is based on ideas from image processing and especially on the Helmholtz Principle from the Gestalt Theory of human perception. Applied to the problem of keyword extraction, it delivers fast and effective tools to identify meaningful words using parameter-free methods. We also define levels of meaningfulness of document words, which allows control of the sizes of selected keywords sets providing for different application needs. After that, based on the introduced level of meaningfulness, we model a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle. We demonstrated that for some range of the parameters, the resulting graph becomes a small-world network [3]. Such a remarkable structure opens the possibility of applying many measures and tools from the theory of social networks to the problem of extracting the most important sentences and structures from text documents [4]. We also present our new software for document analysis and automatic text summarization.

Item Type: Conference or Workshop Item (Poster)
Date Type: Publication
Status: Published
Schools: Mathematics
Subjects: Q Science > QA Mathematics
Related URLs:
Last Modified: 19 Oct 2022 10:38

Citation Data

Actions (repository staff only)

Edit Item Edit Item