Early detection of fake news on emerging topics through weak supervision

Akdag, Serhat; ÇİÇEKLİ, FEHİME

doi:10.1007/s10844-024-00852-1

Early detection of fake news on emerging topics through weak supervision

Akdag S. H., ÇİÇEKLİ F. N.

Journal of Intelligent Information Systems, cilt.62, sa.5, ss.1263-1284, 2024 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 62 Sayı: 5
Basım Tarihi: 2024
Doi Numarası: 10.1007/s10844-024-00852-1
Dergi Adı: Journal of Intelligent Information Systems
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, IBZ Online, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC
Sayfa Sayıları: ss.1263-1284
Anahtar Kelimeler: Fake news detection, Language models, Text classification, Weakly supervised learning
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

In this paper, we present a methodology for the early detection of fake news on emerging topics through the innovative application of weak supervision. Traditional techniques for fake news detection often rely on fact-checkers or supervised learning with labeled data, which is not readily available for emerging topics. To address this, we introduce the Weakly Supervised Text Classification framework (WeSTeC), an end-to-end solution designed to programmatically label large-scale text datasets within specific domains and train supervised text classifiers using the assigned labels. The proposed framework automatically generates labeling functions through multiple weak labeling strategies and eliminates underperforming ones. Labels assigned through the generated labeling functions are then used to fine-tune a pre-trained RoBERTa classifier for fake news detection. By using a weakly labeled dataset, which contains fake news related to the emerging topic, the trained fake news detection model becomes specialized for the topic under consideration. We explore both semi-supervision and domain adaptation setups, utilizing small amounts of labeled data and labeled data from other domains, respectively. The fake news classification model generated by the proposed framework excels when compared with all baselines in both setups. In addition, when compared to its fully supervised counterpart, our fake news detection model trained through weak labels achieves accuracy within 1%, emphasizing the robustness of the proposed framework’s weak labeling capabilities.