Balinsky, Helen, Balinsky, Alexander ORCID: https://orcid.org/0000-0002-8151-4462 and Simske, Steven 2011. Automatic text summarization and small-world networks. Presented at: 11th ACM symposium on Document engineering, Mountain View, CA, USA, 19-22 September 2011. Proceedings of the 11th ACM symposium on Document engineering. New York, NY: Association for Computing Machinery, pp. 175-184. 10.1145/2034691.2034731 |
Abstract
Automatic text summarization is an important and challenging problem. Over the years, the amount of text available electronically has grown exponentially. This growth has created a huge demand for automatic methods and tools for text summarization. We can think of automatic summarization as a type of information compression. To achieve such compression, better modelling and understanding of document structures and internal relations is required. In this article, we develop a novel approach to extractive text summarization by modelling texts and documents as small-world networks. Based on our recent work on the detection of unusual behavior in text, we model a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle. We demonstrate that for some range of the parameters, the resulting graph becomes a small-world network. Such a remarkable structure opens the possibility of applying many measures and tools from social network theory to the problem of extracting the most important sentences and structures from text documents. We hope that documents will be also a new and rich source of examples of complex networks.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Mathematics |
Subjects: | Q Science > QA Mathematics |
Uncontrolled Keywords: | Computing Methodologies; Document and text processing; Artificial Intelligence; Natural language processing; Pattern recognition; I.5.4 Applications |
Publisher: | Association for Computing Machinery |
ISBN: | 9781450308632 |
Last Modified: | 19 Oct 2022 10:38 |
URI: | https://orca.cardiff.ac.uk/id/eprint/25039 |
Citation Data
Cited 16 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |