Streaming Event Detection in Microblogs: Balancing Accuracy and Performance


Sahın O. C. , Karagöz P. , Tatbul N.

International Conference on Web Engineering (ICWE), Seoul, South Korea, 11 - 14 June 2019 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1007/978-3-030-19274-7_10
  • City: Seoul
  • Country: South Korea
  • Keywords: Online event detection, Burst detection, Stream processing, Data stream management, Microblogging

Abstract

In this work, we model the problem of online event detection in microblogs as a stateful stream processing problem and offer a novel solution that balances result accuracy and performance. Our new approach builds on two state of the art algorithms. The first algorithm is based on identifying bursty keywords inside blocks of blog messages. The second one involves clustering blog messages based on similarity of their contents. To combine the computational simplicity of the keyword-based algorithm with the semantic accuracy of the clustering-based algorithm, we propose a new hybrid algorithm. We then implement these algorithms in a streaming manner, on top of Apache Storm augmented with Apache Cassandra for state management. Experiments with a 12M tweet dataset from Twitter show that our hybrid approach provides a better accuracy-performance compromise than the previous approaches.