Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Sensitive content classification in social media: A holistic resource and evaluation

Antypas, Dimosthenis, Sen, Indira, Perez Almendros, Carla ORCID: https://orcid.org/0000-0001-9360-4011, Camacho Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239 and Barbieri, Francesco 2024. Sensitive content classification in social media: A holistic resource and evaluation. [Online]. arXiv. Available at: https://doi.org/10.48550/arXiv.2411.19832

[thumbnail of 2411.19832v2.pdf]
Preview
PDF - Submitted Pre-Print Version
Download (540kB) | Preview

Abstract

The detection of sensitive content in large datasets is crucial for ensuring that shared and analysed data is free from harmful material. However, current moderation tools, such as external APIs, suffer from limitations in customisation, accuracy across diverse sensitive categories, and privacy concerns. Additionally, existing datasets and open-source models focus predominantly on toxic language, leaving gaps in detecting other sensitive categories such as substance abuse or self-harm. In this paper, we put forward a unified dataset tailored for social media content moderation across six sensitive categories: conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam. By collecting and annotating data with consistent retrieval strategies and guidelines, we address the shortcomings of previous focalised research. Our analysis demonstrates that fine-tuning large language models (LLMs) on this novel dataset yields significant improvements in detection performance compared to open off-the-shelf models such as LLaMA, and even proprietary OpenAI models, which underperform by 10-15% overall. This limitation is even more pronounced on popular moderation APIs, which cannot be easily tailored to specific sensitive content categories, among others.

Item Type: Website Content
Date Type: Published Online
Status: Published
Schools: Schools > Computer Science & Informatics
Publisher: arXiv
ISSN: 23318422
Last Modified: 26 Feb 2025 11:01
URI: https://orca.cardiff.ac.uk/id/eprint/175712

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics