Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Web classification of conceptual entities using co-training

Sun, Aixin, Liu, Ying ORCID: https://orcid.org/0000-0001-9319-5940 and Lim, Ee-Peng 2011. Web classification of conceptual entities using co-training. Expert Systems with Applications 38 (12) , pp. 14367-14375. 10.1016/j.eswa.2011.03.010

Full text not available from this repository.

Abstract

Social networking websites, which profile objects with predefined attributes and their relationships, often rely heavily on their users to contribute the required information. We, however, have observed that many web pages are actually created collectively according to the composition of some physical or abstract entity, e.g., company, people, and event. Furthermore, users often like to organize pages into conceptual categories for better search and retrieval, making it feasible to extract relevant attributes and relationships from the web. Given a set of entities each consisting of a set of web pages, we name the task of assigning pages to the corresponding conceptual categories conceptual web classification. To address this, we propose an entity-based co-training (EcT) algorithm which learns from the unlabeled examples to boost its performance. Different from existing co-training algorithms, EcT has taken into account the entity semantics hidden in web pages and requires no prior knowledge about the underlying class distribution which is crucial in standard co-training algorithms used in web classification. In our experiments, we evaluated EcT, standard co-training, and other three non co-training learning methods on Conf-425 dataset. Both EcT and co-training performed well when compared to the baseline methods that required large amount of training examples.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Centre for Advanced Manufacturing Systems At Cardiff (CAMSAC)
Engineering
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
T Technology > TA Engineering (General). Civil engineering (General)
Uncontrolled Keywords: Conceptual web classification; Co-training; Web classification
Publisher: Elsevier
ISSN: 0957-4174
Last Modified: 25 Oct 2022 07:59
URI: https://orca.cardiff.ac.uk/id/eprint/51040

Citation Data

Cited 11 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item