Balinsky, Alexander ORCID: https://orcid.org/0000-0002-8151-4462, Balinsky, Helen and Simske, Steven 2011. Rapid change detection and text mining. Presented at: 2nd IMA Conference on Mathematics in Defence, Defence Academy of the United Kingdom, Swindon, 20 October 2011. |
Abstract
In this presentation we review and present a novel approach to text data mining and automatic text summarization. This modeling includes several steps. First, we apply a rapid change detection algorithm in data streams and documents, introduced in [1, 2]. It is based on ideas from image processing and especially on the Helmholtz Principle from the Gestalt Theory of human perception. Applied to the problem of keyword extraction, it delivers fast and effective tools to identify meaningful words using parameter-free methods. We also define levels of meaningfulness of document words, which allows control of the sizes of selected keywords sets providing for different application needs. After that, based on the introduced level of meaningfulness, we model a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle. We demonstrated that for some range of the parameters, the resulting graph becomes a small-world network [3]. Such a remarkable structure opens the possibility of applying many measures and tools from the theory of social networks to the problem of extracting the most important sentences and structures from text documents [4]. We also present our new software for document analysis and automatic text summarization.
Item Type: | Conference or Workshop Item (Poster) |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Mathematics |
Subjects: | Q Science > QA Mathematics |
Related URLs: | |
Last Modified: | 19 Oct 2022 10:38 |
URI: | https://orca.cardiff.ac.uk/id/eprint/25040 |
Citation Data
Actions (repository staff only)
Edit Item |