A Survey on Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation, Semantic Similarity and Code Smell Detection

Kamaraj Natraj, A.V Ramani


Bug localization is the task of determining the source code which entities are relevant to a bug report. It is an important task of classification in software data set resources. Manual bug localization is labour intensive since developers must consider thousands of source code entities. To builds bug localization classifiers, based on information retrieval models, to locate entities that are textually similar to the bug report. Current research, however, does not consider the effect of classifier configuration. Designer uses data to detect a bug located in the segment of the source code to correct the bug. They are several categories are used, i.e., code smells detection and pattern clustering. To identify the numerous semantic relations existing between two given words, a pattern clustering algorithm has been proposed. This survey proposed research analysis on the bug localization classifiers based on information retrieval models to locate entities that are textually alike to the bug report.

Full Text:



Stacy K. Lukins, Nicholas A. Kraft, Letha H.Etzkorn, “Source Code Retrieval for Bug Localization using Latent Dirichlet Allocation”, 15th Working Conference on Reverse Engineering, IEEE computer society, 2008, pp. 155-164

D.Poshyvanyk, Y.G. Guéhéneuc, A. Marcus, G.Antoniol and V.Rajlich, “Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Location”, In Proc. 14th IEEE Int. Conf. on Program Comprehension, Athens, Greece, June 2006, pp. 137-148.

T.Hofmann, “Probabilistic Latent Semantic Indexing”, In Proc. 22nd Annu. ACM SIGIR Int. Conf. on Research and Development in Information Retrieval, Berkeley, CA, USA, August 1999, pp.50-57

C.D.Manning, P.Raghavan, and H.Schutze, Introduction to Information Retrieval, vol. 1, Cambridge Univ. Press Cambridge, 2008.

Stephen W. Thomas, Meiyappan Nagappan, Dorothea Blostein, Ahmed E. Hassan, “ The impact of Classifier Configuration and Classifier Combination on Bug Localization”, IEEE Transactions on Software Engineering ,Vol.39, No.10, October 2013, pp.1427-1443

G.Salton, A.Wong and C.S.Yang,“A Vector Space Model for Automatic Indexing,” Comm. ACM, vol.18,no.11,pp.613-620, 1975.

S.Deerwester, S.T.Dumais, G.W. Furnas, T.K. Landauer, &R.Harshman, “Indexing by Latent Semantic Analysis,” J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.

D.M.Blei, A.Y.Ng, and M.I. Jordan, “Latent Dirichlet Allocation,” J. Machine Learning Research, vol. 3, pp. 993-1022, 2003

D.M.Blei & J.D.Lafferty,“Topic Models,” Text Mining: Classification, Clustering andApplications,pp.71-4.Chapman&Hall, 2009.

M.D’Ambros, M. Lanza, and R. Robbes, “Evaluating Defect Prediction Approaches: A Benchmark and an Extensive Comparison,” Empirical Software Eng., vol. 17, no. 4, pp. 531-577, 2012

A.T.Nguyen, T.T. Nguyen, J. Al-Kofahi, H.V. Nguyen, and T.N. Nguyen, “A Topic-Based Approach for Narrowing the Search Space of Buggy Files from a Bug Report,” Proc. 26th Int’l Conf.Automated Software Eng., pp. 263-272, 2011.

Shivani Rao, Avinash Kak , “ Retrieval from Software Libraries for Bug localization: A Comparative Study of Generic and Composite Text Models”, Waikiki, Honolulu, HI, USA, May 2011, pp. 43-52.

S.K.Lukins, N.A.Karft E.H.Letha. Source Code Retrieval for Bug Localization using Latent Dirichlet Allocation. In 15th Working Conference on Reverse Engineering, 2008.

X.Liu and W.B.Croft. Cluster-Based Retrieval using Language Models.Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’04, pages 186–193, New York,NY,USA, 2004,ACM

B.Cleary, C. Exton, J. Buckley, and M. English. An Empirical Analysis of Information Retrieval based Concept Location Techniques in Software Comprehension. Empirical Softw. Engg., 14(1):93–130, 2009.

Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka, “ A Web Search Engine – Based Approach to Measure Semantic Similarity between Words”, IEEE Transactions on Knowledge and Data Engineering, Vol 23, No. X, 2011, pp. 01-14.

R.Rada, H.Mili, E.Bichnell, and M. Blettner, “Development and Application of a Metric on Semantic Nets,” IEEE Trans. Systems, Man and Cybernetics, vol. 19, no. 1, pp. 17-30, Jan./Feb. 1989.

R.Cilibrasi and P.Vitanyi, “The Google Similarity Distance,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 3, pp. 370-383, Mar. 2007.

Z. Harris, “Distributional Structure,” Word, vol. 10, pp. 146-162, 1954.

D. Lin, “Automatic Retrieval and Clustering of Similar Words,” Proc. 17th Int’l Conf. Computational Linguistics (COLING), pp. 768- 774, 1998.

R.Bhagat and D.Ravichandran, “Large Scale Acquisition of Paraphrases for Learning Surface Patterns,” Proc. Assoc. for Computational Linguistics: Human Language Technologies (ACL ’08: HLT), pp. 674-682, 2008.

Renee C. Bryce, Sreedevi Sampath, Atif M.Memon, “ Developing a Single Model and Test Prioritization Strategies for Event–Driven Software”, IEEE Transactions on Software Engineering, Vol. 37, No. 1, January / February 2011, pp. 48-64.

P. Brooks, B. Robinson, and A.M. Memon, “An Initial Characterization of Industrial Graphical User Interface Systems,” Proc. IEEE Int’l Conf. Software Testing, Verification, and Validation, pp. 11-20, 2009.

A.M. Memon and Q.Xie,“Studying the Fault-Detection Effectiveness of GUI Test Cases for Rapidly Evolving Software,” IEEE Trans. Software Eng.,vol.31, no. 10, pp. 884-896, Oct. 2005.

“WebSite Test Tools and Site ManagementTools,”http://www.softwareeqatest .com/ qatweb1.html, Apr.2009

K. Onoma, W.-T. Tsai, M. Poonawala, and H. Suganuma,“Regression Testing in an Industrial Environment,” Comm. ACM, vol. 41, no. 5, pp. 81-86, May 1988.

Wael Kessentini, Marouane Kessentini, Houari Sahraoui, Slim Bechikh, Ali Ouni, “A Cooperative Parallel Search –Based Software Engineering Approach for Code–Smells Detection”, IEEE Transaction on Software engineering, Vol 40, No.9, September 2014, pp.841-861

S. R. Chidamber and C. F. Kemerer, “A metrics suite for object-oriented design,” IEEE Trans. Softw. Eng., vol. 20, no. 6, pp. 293–318, Jun. 1994.

P. Siarry and Z. Michalewicz, Advances in Metaheuristics for Hard Optimization (Natural Computing Series). New York, NY, USA: Springer, 2008.

W. Banzhaf, “Genotype-phenotype-mapping and neutral variation: A case study in genetic programming,” in Proc. Int. Conf. Parallel Problem Solving from Nature, 1994, pp. 322–332.

D.E.Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA,USA: Addison Wesley, 1989.

Stephen.W. Thomas, “Mining Software Repositories with Topic Models”, Technical Report 2012-586, School of Computing, Queen’s University, 2012.

W.H. Kruskal and W.A Wallis, “Use of ranks in one criterion variance analysis”, J.Amer. Statist. Assoc., vol. 47, no. 260,pp. 583-621.

G.Langelier, H.A.Sahraoui, P.Poulin, “Visualization based analysis of quality for large scale software systems”, in Proc.Int.Conf. Autom.Softw. Engg., 2005,pp. 15-25.

DOI: https://doi.org/10.23956/ijarcsse.v8i8.832


© International Journals of Advanced Research in Computer Science and Software Engineering (IJARCSSE)| All Rights Reserved | Powered by Advance Academic Publisher.