Big Data and Cognitive Computing (Dec 2022)
Locating Source Code Bugs in Software Information Systems Using Information Retrieval Techniques
Abstract
Bug localization is the process through which the buggy source code files are located regarding a certain bug report. Bug localization is an overwhelming and time-consuming process. Automating bug localization is the key to help developers and increase their productivities. Expanding bug reports with more semantic and increasing software understanding using information retrieval and natural language techniques will be the way to locate the buggy source code file, in which the bug report works as a query and source code as search space. This research investigates the effect of segmenting open source files into executable code and comments, as they have a conflicting nature, seeks the effect of synonyms on the accuracy of bug localization, and examines the effect of “part-of-speech” techniques on reducing the manual inspection for appropriate synonyms. This research aims to approve that such methods improve the accuracy of bug localization tasks. The used approach was evaluated on three Java open source software, namely Eclipse 3.1, AspectJ 1.0, and SWT 3.1; we implement our dedicated Java tool to adopt our methodology and conduct several experiments on each software. The experimental results reveal a considerable improvement in recall and precision levels, and the developed methods display an accuracy improvement of 4–10% compared with the state-of-the-art approaches.
Keywords