MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection


TORAMAN Ç., Ozcelik O., Şahinuç F., Can F.

Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, Torino, İtalya, 20 - 25 Mayıs 2024, ss.11283-11295 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Basıldığı Şehir: Hybrid, Torino
  • Basıldığı Ülke: İtalya
  • Sayfa Sayıları: ss.11283-11295
  • Anahtar Kelimeler: Human-annotation, Misinformation detection, Multi-event dataset, Tweet
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

The rapid dissemination of misinformation through online social networks poses a pressing issue with harmful consequences jeopardizing human health, public safety, democracy, and the economy; therefore, urgent action is required to address this problem. In this study, we construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation labels for several recent events between 2020 and 2022, including the Russia-Ukraine war, COVID-19 pandemic, and Refugees. The dataset includes user engagements with the tweets in terms of likes, replies, retweets, and quotes. We also provide a detailed data analysis with descriptive statistics and the experimental results of a benchmark evaluation for misinformation detection.