Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Extracting attributes for online communities

Alsulami, Ashwaq 2023. Extracting attributes for online communities. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of 2023alsulamiaphd.pdf]
Preview
PDF - Accepted Post-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview
[thumbnail of Cardiff University Electronic Publication Form] PDF (Cardiff University Electronic Publication Form) - Supplemental Material
Restricted to Repository staff only

Download (2MB)

Abstract

Numerous organisations frequently require insights into social media discussions, including identifying trending topics and understanding the characteristics of individuals participating in these discussions. Numerous methods have been suggested to extract attributes that can effectively characterise a group engaged in a conversation. Some of these methods rely on supervised learning, which requires a substantial volume of labelled data. Others are bespoke techniques, which can only be applied to certain attributes, for example, using language models to detect that a tweet is written by a person of what age. These methods lack scalability to capture a broader range of attributes because they either require a prohibitively expensive process for data labelling or can only deal with some specific attributes. In this thesis, we propose an unsupervised learning approach to extracting attributes from user profiles, aiming to address the scalability issue associated with the existing methods. Our approach consists of two stages. In the first stage, lexical sources and semantic analysis are used to determine whether a user in their profile description suggests a particular attribute. In the second stage, we use the results from the first stage as training data to train a classification model to determine the attribute for users whose attribute cannot be identified in the first stage. Our findings demonstrate that our approach to detecting attributes in discussion groups can capture attribute from user profiles without the need for data labelling. We have effectively implemented our methodology across a set of attributes, obtaining an average accuracy of 78% in attribute extraction. We have effectively examined the application of the developed method and determined the percentage of users within a given hashtag community exhibiting a specific attribute. This analysis has provided valuable insights into the characteristics of the group.

Item Type: Thesis (PhD)
Date Type: Completion
Status: Unpublished
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Date of First Compliant Deposit: 3 January 2024
Date of Acceptance: 20 December 2023
Last Modified: 16 Jan 2024 10:58
URI: https://orca.cardiff.ac.uk/id/eprint/165165

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics