IEEE Access (Jan 2023)

DBSCAN-Based Automatic De-Duplication for Software Quality Inspection Data

  • Chun-Hua Cao,
  • Ya-Na Tang,
  • Hua Zhou,
  • Yu-Li Li,
  • Zbigniew Marszalek

DOI
https://doi.org/10.1109/ACCESS.2022.3164192
Journal volume & issue
Vol. 11
pp. 17882 – 17890

Abstract

Read online

Software quality inspection will generate too much data, and removing duplicate data can improve the efficiency of software quality inspection. This paper studies the automatic de-duplication method of software quality inspection data based on density-based spatial clustering of applications with noise (DBSCAN) clustering. Intelligent optimization algorithm is used to generate software quality inspection data by initializing individuals, calculating fitness function value, improving individuals and splitting individuals that meet the conditions. Local linear embedding algorithm is selected to extract software quality inspection data features by searching neighborhood points, calculating reconstruction weight and projection vector. The extracted features are used to select DBSCAN multi-density clustering algorithm of regional division, and the automatic de-duplication of software quality inspection data is realized by grid division, data bin dividing and grid merging. The experimental results show that the precision and recall of this method are higher than 99%, and the resource consumption rate is low, which can effectively improve the efficiency of software quality inspection.

Keywords