Alsulami, Ashwaq
2023.
Extracting attributes for online communities.
PhD Thesis,
Cardiff University.
Item availability restricted. |
Preview |
PDF
- Accepted Post-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (1MB) | Preview |
PDF (Cardiff University Electronic Publication Form)
- Supplemental Material
Restricted to Repository staff only Download (2MB) |
Abstract
Numerous organisations frequently require insights into social media discussions, including identifying trending topics and understanding the characteristics of individuals participating in these discussions. Numerous methods have been suggested to extract attributes that can effectively characterise a group engaged in a conversation. Some of these methods rely on supervised learning, which requires a substantial volume of labelled data. Others are bespoke techniques, which can only be applied to certain attributes, for example, using language models to detect that a tweet is written by a person of what age. These methods lack scalability to capture a broader range of attributes because they either require a prohibitively expensive process for data labelling or can only deal with some specific attributes. In this thesis, we propose an unsupervised learning approach to extracting attributes from user profiles, aiming to address the scalability issue associated with the existing methods. Our approach consists of two stages. In the first stage, lexical sources and semantic analysis are used to determine whether a user in their profile description suggests a particular attribute. In the second stage, we use the results from the first stage as training data to train a classification model to determine the attribute for users whose attribute cannot be identified in the first stage. Our findings demonstrate that our approach to detecting attributes in discussion groups can capture attribute from user profiles without the need for data labelling. We have effectively implemented our methodology across a set of attributes, obtaining an average accuracy of 78% in attribute extraction. We have effectively examined the application of the developed method and determined the percentage of users within a given hashtag community exhibiting a specific attribute. This analysis has provided valuable insights into the characteristics of the group.
Item Type: | Thesis (PhD) |
---|---|
Date Type: | Completion |
Status: | Unpublished |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Date of First Compliant Deposit: | 3 January 2024 |
Date of Acceptance: | 20 December 2023 |
Last Modified: | 16 Jan 2024 10:58 |
URI: | https://orca.cardiff.ac.uk/id/eprint/165165 |
Actions (repository staff only)
Edit Item |