Improved Macro-clusters generation using Top-k shared Micro-clusters in Data Streams



Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.

Full Text:



Y. Chen and L. Tu, “Density-based clustering for real-time stream data,” in Proc. 13th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2007, pp. 133–142.

Michael Hahsler, “Clustering Data Streams Based on Shared Density between Micro-Clusters” in IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 6, June2016.

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 1996, pp. 226–231.

A. Amini and T. Y. Wah, “Leaden-stream: A leader density-based clustering algorithm over evolving data stream,” J. Comput. Commun.,

vol. 1, no. 5, pp. 26–31, 2013.

C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A framework for clustering evolving data streams,” in Proc. Int. Conf. Very Large

Data Bases, 2003, pp. 81–92.

Maryam Mousavi, Azuraliza Abu Bakar and Mohammadmahdi Vakilian “Data Stream Clustering Algorithms: A Review”, Int. J. Advance Soft Compu. Appl, Vol. 7, No. 3, November 2015 ISSN 2074-8523.

F. Cao, M. Ester, W. Qian, and A. Zhou, “Density-based clustering over an evolving data stream with noise,” in Proc. SIAM Int. Conf.

Data Mining, 2006, pp. 328–339.

]J. A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka, A.C.P.L.F.d.Carvalho, and J. A. Gama, “Data stream clustering: A survey,” ACM Comput. Surveys, vol. 46, no. 1, pp. 13:1–13:31, Jul. 2013.

L. Tu and Y. Chen, “Stream data clustering based on grid density and attraction,” ACM Trans. Knowl. Discovery from Data, vol. 3, no. 3, pp. 1–27, 2009.

S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan, “Clustering data streams,” in Proc. ACM Symp. Found. Comput. Sci., 12–14

Nov. 2000, pp. 359–366.



  • There are currently no refbacks.

© International Journals of Advanced Research in Computer Science and Software Engineering (IJARCSSE)| All Rights Reserved | Powered by Advance Academic Publisher.