Streaming Event Detection in Microblogs: Balancing Accuracy and Performance


Sahın O. C., Karagöz P., Tatbul N.

International Conference on Web Engineering (ICWE), Seoul, Güney Kore, 11 - 14 Haziran 2019 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1007/978-3-030-19274-7_10
  • Basıldığı Şehir: Seoul
  • Basıldığı Ülke: Güney Kore
  • Anahtar Kelimeler: Online event detection, Burst detection, Stream processing, Data stream management, Microblogging
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

In this work, we model the problem of online event detection in microblogs as a stateful stream processing problem and offer a novel solution that balances result accuracy and performance. Our new approach builds on two state of the art algorithms. The first algorithm is based on identifying bursty keywords inside blocks of blog messages. The second one involves clustering blog messages based on similarity of their contents. To combine the computational simplicity of the keyword-based algorithm with the semantic accuracy of the clustering-based algorithm, we propose a new hybrid algorithm. We then implement these algorithms in a streaming manner, on top of Apache Storm augmented with Apache Cassandra for state management. Experiments with a 12M tweet dataset from Twitter show that our hybrid approach provides a better accuracy-performance compromise than the previous approaches.