Mathematics (Jul 2024)

Software Fault Localization Based on Weighted Association Rule Mining and Complex Networks

  • Wentao Wu,
  • Shihai Wang,
  • Bin Liu

DOI
https://doi.org/10.3390/math12132113
Journal volume & issue
Vol. 12, no. 13
p. 2113

Abstract

Read online

Software fault localization technology aims to identify suspicious statements that cause software failures, which is crucial for ensuring software quality. Spectrum-based software fault location (SBFL) technology calculates the suspiciousness of each statement by analyzing the correlation between statement coverage information and execution results in test cases. SBFL has attracted increasing attention from scholars due to its high efficiency and scalability. However, existing SBFL studies have shown that a large number of statements share the same suspiciousness, which hinders software debuggers from quickly identifying the location of faulty statements. To address this challenge, we propose an SBFL model based on weighted association rule mining and complex networks: FL-WARMCN. The algorithm first uses Jaccard to measure the distance between passing and failing test cases, and applies it as the weight of passing test cases. Next, FL-WARMCN calculates the initial suspiciousness of each statement based on the program spectrum data. Then, the FL-WARMCN model utilizes a weighted association rule mining algorithm to obtain the correlation relationships between statements and models the network based on this. In the network, the suspiciousness of statements is used as node weights, and the correlation between statements is used as edge weights. We chose the eigenvector centrality that takes into account the degree centrality of statements and the importance of neighboring statements to calculate the importance of each statement, and used it as a weight to incorporate into the weighted suspiciousness calculation of the statement. Finally, we applied the FL-WARMCN model for experimental validation on the Defects4J dataset. The results showed that the model was significantly superior to other baselines. In addition, we analyzed the impact of different node and edge weights on model performance.

Keywords