Applied Sciences (Sep 2019)

Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies

  • Shikai Guo,
  • Miaomiao Wei,
  • Siwen Wang,
  • Rong Chen,
  • Chen Guo,
  • Hui Li,
  • Tingting Li

DOI
https://doi.org/10.3390/app9183663
Journal volume & issue
Vol. 9, no. 18
p. 3663

Abstract

Read online

As software systems become increasingly large, the logic becomes more complex, resulting in a large number of bug reports being submitted to the bug repository daily. Due to tight schedules and limited human resources, developers may not have enough time to inspect all the bugs. Thus, they often concentrate on the bugs that have large impacts. However, there are two main challenges limiting the automation technology that would help developers to become aware of high-impact bug reports early, namely, low quality and class distribution imbalance. To address these two challenges, we propose an approach to identify high-impact bug reports that combines the data reduction and imbalanced learning strategies. In the data reduction phase, we combine feature selection with the instance selection method to build a small-scale and high-quality set of bug reports by removing the bug reports and words that are redundant or noninformative; in the imbalanced learning strategies phase, we handle the imbalanced distributions of bug reports through four imbalanced learning strategies. We experimentally verified that the method of combining the data reduction and imbalanced learning strategies could effectively identify high-impact bug reports.

Keywords