Topic detection using MFSs

Yap, Ivan, Loh, Han Tong, Shen, Lixiang and Liu, Ying

2006. Topic detection using MFSs. Presented at: 19th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE 2006), Annecy, France, 27-30 June 2006. Published in: Ali, Moonis and Dapoigny, Richard eds. Advances in Applied Artificial Intelligence: 19th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2006, Annecy, France, June 27-30, 2006. Proceedings. Lecture Notes in Computer Science (4031) Berlin Heidelberg: Springer, pp. 342-352. 10.1007/11779568_38

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1007/11779568_38

Abstract

When analyzing a document collection, a key piece of information is the number of distinct topics it contains. Document clustering has been used as a tool to facilitate the extraction of such information. However, existing clustering methods do not take into account the sequences of the words in the documents, and usually do not have the means to describe the contents within each topic cluster. In this paper, we record our investigation and results using Maximal Frequent word Sequences (MFSs) as building blocks in identifying distinct topics. The supporting documents of MFSs are grouped into an equivalence class and then linked to a topic cluster, and the MFSs serve as the document cluster identifier. We describe the original method in extracting the set of MFSs, and how it can be adapted to identify topics in a textual dataset. We also demonstrate how the MFSs themselves can act as topic descriptors for the clusters. Finally, the benchmarking study with other existing clustering methods, i.e. k-Means and EM algorithm, shows the effectiveness of our approach for topic detection.

Item Type:	Conference or Workshop Item - published (Paper)
Date Type:	Publication
Status:	Published
Schools:	Schools > Engineering
Subjects:	T Technology > TA Engineering (General). Civil engineering (General)
Publisher:	Springer
ISBN:	9783540354536
ISSN:	0302-9743
Last Modified:	25 Oct 2022 08:05
URI:	https://orca.cardiff.ac.uk/id/eprint/51339

Citation Data

Cited 16 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

CORE (COnnecting REpositories)