Online embedding and clustering of data streams

Zubaroǧlu A., Atalay V.

3rd International Conference on Big Data Research, ICBDR 2019, Cergy-Pontoise, France, 20 - 21 November 2019, pp.142-146 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume:
  • Doi Number: 10.1145/3372454.3372481
  • City: Cergy-Pontoise
  • Country: France
  • Page Numbers: pp.142-146
  • Middle East Technical University Affiliated: Yes


© 2019 Association for Computing Machinery.Number of connected devices is steadily increasing and these devices continuously generate data streams. These data streams are often high dimensional and contain concept drift. Real-time processing of data streams is arousing interest despite many challenges. Clustering is a method that does not need labeled instances (it is unsupervised) and it can be applied with less prior information about the data. These properties make clustering one of the most suitable methods for real-time data stream processing. Moreover, data embedding is a process that may simplify clustering and makes visualization of high dimensional data possible. There exist several data stream clustering algorithms in the literature, however no data stream embedding method exists. UMAP is a data embedding algorithm that is suitable to be applied on data streams, but it cannot adopt concept drift. In this study, we have developed a new method to apply UMAP on data streams, adopt concept drift and cluster embedded data instances using any distance based clustering algorithms.