Data stream clustering: a review

Zubaroglu, Alaettin; Atalay, MEHMET

doi:10.1007/s10462-020-09874-x

Data stream clustering: a review

Atıf İçin Kopyala

Zubaroglu A., Atalay V.

ARTIFICIAL INTELLIGENCE REVIEW, cilt.54, sa.2, ss.1201-1236, 2021 (SCI-Expanded)

Yayın Türü: Makale / Derleme
Cilt numarası: 54 Sayı: 2
Basım Tarihi: 2021
Doi Numarası: 10.1007/s10462-020-09874-x
Dergi Adı: ARTIFICIAL INTELLIGENCE REVIEW
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, ABI/INFORM, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, Educational research abstracts (ERA), Index Islamicus, INSPEC, Library and Information Science Abstracts, Library, Information Science & Technology Abstracts (LISTA), Metadex, Psycinfo, zbMATH, Civil Engineering Abstracts
Sayfa Sayıları: ss.1201-1236
Anahtar Kelimeler: Data streams, Data stream clustering, Real-time clustering, EVOLVING DATA STREAMS, ALGORITHMS
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for real-time data stream processing, because it can be applied with less prior information about the data and it does not need labeled instances. However, data stream clustering differs from traditional clustering in many aspects and it has several challenging issues. Here, we provide information regarding the concepts and common characteristics of data streams, such as concept drift, data structures for data streams, time window models and outlier detection. We comprehensively review recent data stream clustering algorithms and analyze them in terms of the base clustering technique, computational complexity and clustering accuracy. A comparison of these algorithms is given along with still open problems. We indicate popular data stream repositories and datasets, stream processing tools and platforms. Open problems about data stream clustering are also discussed.