International Conference on Web Engineering (ICWE), Seoul, Güney Kore, 11 - 14 Haziran 2019
In this work, we model the problem of online event detection in microblogs as a stateful stream processing problem and offer a novel solution that balances result accuracy and performance. Our new approach builds on two state of the art algorithms. The first algorithm is based on identifying bursty keywords inside blocks of blog messages. The second one involves clustering blog messages based on similarity of their contents. To combine the computational simplicity of the keyword-based algorithm with the semantic accuracy of the clustering-based algorithm, we propose a new hybrid algorithm. We then implement these algorithms in a streaming manner, on top of Apache Storm augmented with Apache Cassandra for state management. Experiments with a 12M tweet dataset from Twitter show that our hybrid approach provides a better accuracy-performance compromise than the previous approaches.