Learning Stretch-Shrink Latent Representations With Autoencoder and K-Means for Software Defect Prediction

Viet Anh Phan

doi:10.1109/ACCESS.2022.3219589

IEEE Access (Jan 2022)

Learning Stretch-Shrink Latent Representations With Autoencoder and K-Means for Software Defect Prediction

Viet Anh Phan

Affiliations

Viet Anh Phan: ORCiD; Department of Information Security, Faculty of Information Technology, Le Quy Don Technical University, Hanoi, Vietnam

DOI: https://doi.org/10.1109/ACCESS.2022.3219589
Journal volume & issue: Vol. 10
pp. 117827 – 117835

Abstract

Read online

Detecting defective source code to localize and fix bugs is important to reduce software development efforts. Although deep learning models have made a breakthrough in this field, many issues have not been resolved, such as labeled data shortage and the small size of defective elements. Given two similar programs that differ from each other by an operator or statement, one may be clean while the other may be defective. To address these issues, this study proposes a new deep learning model to facilitate the learning of distinguishing features. The model comprises of three main components: 1) a convolutional neural network-based classifier, 2) an autoencoder, and 3) a k-means cluster. In our model, the autoencoder assists the classifier in generating program latent representations. The k-means cluster provides penalty functions to increase the distinguishability among latent representations. We evaluated the effectiveness of the model according to performance metrics and latent representation quality. The experimental results on the four defect prediction datasets show that the proposed model outperforms the baselines thanks to the generation of sophisticated features.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords