IEEE Access (Jan 2022)

Cognitive Complexity and Graph Convolutional Approach Over Control Flow Graph for Software Defect Prediction

  • Mansi Gupta,
  • Kumar Rajnish,
  • Vandana Bhattacharjee

DOI
https://doi.org/10.1109/ACCESS.2022.3213844
Journal volume & issue
Vol. 10
pp. 108870 – 108894

Abstract

Read online

The software engineering community is working to develop reliable metrics to improve software quality. It is estimated that understanding the source code accounts for 60% of the software maintenance effort. Cognitive informatics is important in quantifying the degree of difficulty or the efforts made by developers to understand the source code. Several empirical studies were conducted in 2003 to assign cognitive weights to each possible basic control structure of software, and these cognitive weights are used by several researchers to evaluate the cognitive complexity of software systems. In this paper, an effort has been made to categorize the Control Flow Graphs (CFGs) nodes according to their node features. In our case, we extracted seven unique features from the program, and each unique feature was assigned an integer value that we evaluated through Cognitive Complexity Measures (CCMs). We then incorporated CCMs’ results as a node feature value in CFGs and generated the same based on the node connectivity for a graph. In order to obtain the feature representation of the graph, a node vector matrix is then created for the graph and passed to the Graph Convolutional Network (GCN). We prepared our data sets using GCN output and then built Deep Neural Network Defect Prediction (DNN-DP) and Convolutional Neural Network Defect Prediction (CNN-DP) models to predict software defects. The Python programming language is used, along with Keras and TensorFlow. Three hundred twenty Python programs were written by our talented UG and PG students, and all experiments were carried out during laboratory classes. Together with three skilled lab programmers, they compiled and ran each individual program and detected defect/no-defect programs before categorizing them into three different classes, namely Simple, Medium, and Complex programs. Accuracy, Receiver Operating Characteristics (ROC), Area Under Curve (AUC), F-measure, Precision and hyper-parameter tuning procedures are used to evaluate the approaches. The experimental results show that the proposed models outperformed state-of-the-art methods such as Nave Bayes (NB), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF) in all evaluation criteria.

Keywords