A Review on Big Data Cleaning and Analytical Tools

K. Dhinakaran, G. Geetharamani


Big Data is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. The categories are structured, semi-structured and unstructured data. Data analysis is a collective term of gathering, organizing and analyzing data for present and future improvements. It also refers to manipulation and analysis of the large volume of data such that big data is of course a complex process. Collecting, analyzing, searching, storing and sharing of big data is a challenging task using modern big data analytic tools. In short, such data is so large and complex that none of the traditional data management tools are capable to store it or process it efficiently. This paper provides some of the cleaning, storing and analytic tools to handle big data.

Full Text:



K. Kambatlaa, G. Kollias, V. Kumar, A. Grama. Trends in big data analytics. J. Parallel Distrib. Comput. vol74, 2014:2561-2573.

V. M. Schönberger, K. Cukier Big data: a revolution that will transform how we live, work and think[M]. John Murray Publishers Ltd ,2013

A. P. Silva , G. R. Mateus. A Location-Based Service Application for a Mobile Computing Environment. Computer Science. 2003(79): 343-360.

Mariam Adedoyin-Olowe1 et.al “A Survey of Data Mining Techniques for Social Media Analysis “

Chu, Cheng, et al. "Map-reduce for machine learning on multicore." Advances in neural information processing systems 19 (2007)

Groves, Peter, Basel Kayyali, David Knott, and Steve Van Kuiken. "The big data revolution in healthcare." McKinsey Quarterly 2013.

Shang, Weiyi, Zhen Ming Jiang, HadiHemmati, Bram Adams, Ahmed E. Hassan, and Patrick Martin. "Assisting developers of big data analytics applications when deploying on Hadoop clouds." In Proceedings of the 2013 International Conference on Software Engineering, pp. 402-411. IEEE Press, 2013.

Aggarwal, N., Liu, H.: Blogosphere: Research Issues, Tools, Applications. ACM SIGKDD Explorations. Vol. 10, issue 1, 20, 2008.

Boiy, E., Hens, P., Deschacht, K., Marie-Francine, M.: Automatic Sentiment Analysis of On-line Text. In: Proceedings of the 11th International Conference on Electronic Publishing. Vienna, Austria, 2007.

Boyd, D. M. and Ellison, N. B.: Social Network Sites: Definition, History, and Scholarship. Journal of Computer Mediated Communication, 13: 210–230. doi: 10.1111/j.1083- 6101.2007.00393.x, 2007.

Castellanos, M., Dayal, M., Hsu, M., Ghosh, R., Dekhil, M.: U LCI: A Social Channel Analysis Platform for Live Customer Intelligence. In: Proceedings of the 2011 international Conference on Management of Data. 2011.

Chakrabarti, S.: Data Mining for Hypertext: A Tutorial Survey. ACM SIGKDD Explorations, 1(2):1-11. 2000.

Chaomei, C., Ibekwe-SanJuan, F., SanJuan, E., Weaver, C.: Visual Analysis of Conflicting Opinions. In: 2006 IEEE Symposium On Visual Analytics And Technology: 59-66. 2006.

Chaovalit, P., Zhou, L.: “Movie Review Mining: A Comparison between Supervised and Unsupervised Classification Approaches,” In: Proceedings of the Hawaii International Conference on System Sciences (HICSS), 2005.

Chen, Z. S., Kalashnikov, D. V. and Mehrotra, S. Exploiting context analysis for combining multiple entity resolution systems. In Proceedings of the 2009 ACM International Conference on Management of Data (SIGMOD'09), 2009.

Chen, Y., Lee, K.: User-Centred Sentiment Analysis on Customer Product Review. World Applied Sciences Journal 12 (special issue on computer applications & knowledge management) 32 – 38, 2011. ACM, New York, NY USA, 2011.

Chi, Y., Zhu, S., Hino, K., Gong. Y., Zhang. Y.: iOLAP: A Framework for Analyzing the Internet, Social Networks, and Other Networked Data. Multimedia. IEEE Transactions on, 11(3):372 – 382, 2009.

Dave, K., Lawrence, S., Pennock, D.: Mining the peanut gallery: Opinion Extraction and Semantic Classification of Product Reviews. In: Proceedings of WWW 519-528, 2003.

Ding, X., B. Liu, Yu, P.: A Holistic Lexicon-based Approach to Opinion Mining. In: Proceedings of the Conference on Web Search and Web Data Mining (WSDM-2008), 2008.

Esuli, A., Sebastiani. F.: Determining the Semantic Orientation of Terms through Gloss Classification. In: Proceedings of ACM International Conference on Information and Knowledge Management (CIKM-2005), 2005.

Gamon, M., Aue, A., Corston-Oliver, S., Ringger, E.: Pulse: Mining Customer Opinions from Free Text. Advances in Intelligent Data Analysis VI, pages 121–132, 2005.

Godbole, N., Srinivasaiah, M., Steven, S.: Large Scale Sentiment Analysis for News and Blogs. In: Proceedings of the International Conference on Weblogs and SM (ICWSM), 2007.

Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, Stan Zdonik, Aurora: a new model and architecture for data stream management, VLDB J. 12 (2) (2003) 120–139.

Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Alexander Rasin, Avi Silberschatz, HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads, in: VLDB, 2009.

Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins, Pig Latin: a Not-So-Foreign language for data processing, in: SIG-MOD, ACM, 2008. ID: 1376726.

DOI: https://doi.org/10.23956/ijarcsse.v9i4.994


  • There are currently no refbacks.

© International Journals of Advanced Research in Computer Science and Software Engineering (IJARCSSE)| All Rights Reserved | Powered by Advance Academic Publisher.