JL-Hate: An Annotated Dataset for Joint Learning of Hate Speech and Target Detection

Buyukdemirci K., Kucukkaya I. E., Olmez E., Toraman Ç.

Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, Torino, İtalya, 20 - 25 Mayıs 2024, ss.9543-9553, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Basıldığı Şehir: Hybrid, Torino
Basıldığı Ülke: İtalya
Sayfa Sayıları: ss.9543-9553
Anahtar Kelimeler: Hate speech detection, Joint learning, Target detection
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

The detection of hate speech is a subject extensively explored by researchers, and machine learning algorithms play a crucial role in this domain. The existing resources mostly focus on text sequence classification for the task of hate speech detection. However, the target of hateful content is another dimension that has not been studied in details due to the lack of data resources. In this study, we address this gap by introducing a novel tweet dataset for the task of joint learning of hate speech detection and target detection, called JL-Hate, for the tasks of sequential text classification and token classification, respectively. The JL-Hate dataset consists of 1,530 tweets divided equally in English and Turkish languages. Leveraging this dataset, we conduct a series of benchmark experiments. We utilize a joint learning model to concurrently perform sequence and token classification tasks on our data. Our experimental results demonstrate consistent performance with the prevalent studies, both in sequence and token classification tasks.