Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Spanning the spectrum of hatred detection: a Persian multi-label hate speech dataset with annotator rationales

Delbari, Zahra, Moosavi, Nafise Sadat and Pilehvar, Mohammad Taher 2024. Spanning the spectrum of hatred detection: a Persian multi-label hate speech dataset with annotator rationales. Presented at: Thirty-Eighth AAAI Conference on Artificial Intelligence, Vancouver, Canada, 20-27 February 2024. Published in: Woolridge, M., Dy, J. and Natarajan, S. eds. Proceedings of the AAAI Conference on Artificial Intelligence. , vol.38 (16) Washington, DC, USA: Association for the Advancement of Artificial Intelligence, pp. 17889-17897. 10.1609/aaai.v38i16.29743
Item availability restricted.

[thumbnail of Spanning_the_Spectrum_of_Hatred_Detection__A_Persian_Multi_Label_Hate_Speech_Dataset_with_Annotator_Rationales__AAAI___Camera_ready_ (1).pdf] PDF - Accepted Post-Print Version
Restricted to Repository staff only until 11 October 2024 due to copyright restrictions.

Download (1MB)

Abstract

With the alarming rise of hate speech in online communities, the demand for effective NLP models to identify instances of offensive language has reached a critical point. However, the development of such models heavily relies on the availability of annotated datasets, which are scarce, particularly for less-studied languages. To bridge this gap for the Persian language, we present a novel dataset specifically tailored to multi-label hate speech detection. Our dataset, called Phate, consists of an extensive collection of over seven thousand manually-annotated Persian tweets, offering a rich resource for training and evaluating hate speech detection models on this language. Notably, each annotation in our dataset specifies the targeted group of hate speech and includes a span of the tweet which elucidates the rationale behind the assigned label. The incorporation of these information expands the potential applications of our dataset, facilitating the detection of targeted online harm or allowing the benchmark to serve research on interpretability of hate speech detection models. The dataset, annotation guideline, and all associated codes are accessible at https://github.com/Zahra-D/Phate.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: Association for the Advancement of Artificial Intelligence
ISBN: 9781577358879
ISSN: 2374-3468
Date of First Compliant Deposit: 11 September 2024
Date of Acceptance: 9 December 2023
Last Modified: 12 Sep 2024 03:52
URI: https://orca.cardiff.ac.uk/id/eprint/168944

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics