Topic Detection based on Deep Learning Language Model in Turkish Microblogs


Sahinuc F., Toraman Ç., Koc A.

29th IEEE Conference on Signal Processing and Communications Applications (SIU), ELECTR NETWORK, 9 - 11 Haziran 2021 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu53274.2021.9477781
  • Basıldığı Ülke: ELECTR NETWORK
  • Orta Doğu Teknik Üniversitesi Adresli: Hayır

Özet

Microblogs are short and irregular texts in which people express their opinions in social media. While classification of social media microblog texts according to their topics constitutes a semantic substructure, it helps implementation of various applications. In this study, an analysis comparing conventional bag-of-words and deep-learning based models for the problem of topic detection in microblogs is presented. Turkish tweets containing microblog texts related to current events in Turkey are collected for preparation of the dataset. Tweets in dataset are labeled according to the hashtags they contain. One conventional bag-of-words (TF-IDF based SVM) and two deep learning based models (BERT and BERTurk) are trained on dataset. Performances of the models are measured by using weighted F1 score. TF-IDF based SVM model, BERT and BERTurk perform with F1 scores of 0.807, 0.831 and 0.854 respectively.