Tweet Length Matters: A Comparative Analysis on Topic Detection in Microblogs


Şahinuç F., Toraman Ç.

43rd European Conference on Information Retrieval, ECIR 2021, Virtual, Online, 28 Mart - 01 Nisan 2021, cilt.12657 LNCS, ss.471-478 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 12657 LNCS
  • Doi Numarası: 10.1007/978-3-030-72240-1_50
  • Basıldığı Şehir: Virtual, Online
  • Sayfa Sayıları: ss.471-478
  • Anahtar Kelimeler: Microblog, Short text, Topic detection, Tweet
  • Orta Doğu Teknik Üniversitesi Adresli: Hayır

Özet

Microblogs are characterized as short and informal text; and therefore sparse and noisy. To understand topic semantics of short text, supervised and unsupervised methods are investigated, including traditional bag-of-words and deep learning-based models. However, the effectiveness of such methods are not together investigated in short-text topic detection. In this study, we provide a comparative analysis on topic detection in microblogs. We construct a tweet dataset based on the recent and important events worldwide, including the COVID-19 pandemic and BlackLivesMatter movement. We also analyze the effect of varying tweet length in both evaluation and training. Our results show that tweet length matters in terms of the effectiveness of a topic-detection method.