Topic Detection based on Deep Learning Language Model in Turkish Microblogs

Sahinuc F., Toraman Ç., Koc A.

29th IEEE Conference on Signal Processing and Communications Applications (SIU), ELECTR NETWORK, 9 - 11 Haziran 2021, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/siu53274.2021.9477781
Basıldığı Ülke: ELECTR NETWORK
Orta Doğu Teknik Üniversitesi Adresli: Hayır

Özet

Microblogs are short and irregular texts in which people express their opinions in social media. While classification of social media microblog texts according to their topics constitutes a semantic substructure, it helps implementation of various applications. In this study, an analysis comparing conventional bag-of-words and deep-learning based models for the problem of topic detection in microblogs is presented. Turkish tweets containing microblog texts related to current events in Turkey are collected for preparation of the dataset. Tweets in dataset are labeled according to the hashtags they contain. One conventional bag-of-words (TF-IDF based SVM) and two deep learning based models (BERT and BERTurk) are trained on dataset. Performances of the models are measured by using weighted F1 score. TF-IDF based SVM model, BERT and BERTurk perform with F1 scores of 0.807, 0.831 and 0.854 respectively.