Identification of Self-Admitted Technical Debt Using Enhanced Feature Selection Based on Word Embedding

Jernej Flisar; Vili Podgorelec

doi:10.1109/ACCESS.2019.2933318

IEEE Access (Jan 2019)

Identification of Self-Admitted Technical Debt Using Enhanced Feature Selection Based on Word Embedding

Jernej Flisar,
Vili Podgorelec

Affiliations

Jernej Flisar: Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia
Vili Podgorelec: ORCiD; Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia

DOI: https://doi.org/10.1109/ACCESS.2019.2933318
Journal volume & issue: Vol. 7
pp. 106475 – 106494

Abstract

Read online

Self-admitted technical debt (SATD) is annotated in source code comments by developers and has been recognized as a great source of discovering flawed software. To reduce manual effort, some recent studies have focused on automated detection of SATD using text classification methods. To train their classifier, these methods need labeled samples, which also require a lot of effort to obtain. We developed a new SATD identification method, which takes advantage of a large corpus of unlabeled code comments, extracted from open source projects, to train a word embedding model. After applying feature selection, the pre-trained word embedding is used for discovering semantically similar features in source code comments to enhance the original feature set. By using such enhanced feature set for classification, our goal was to improve the identification of SATD when compared to existing methods. The proposed feature enhancement method was used with the three most common feature selection methods (CHI, IG, and MI), and three well-known text classification algorithms (NB, SVM, and ME) and was tested on ten open source projects. The experimental results show a significant improvement in SATD identification over the compared methods. With an achieved 82% of correct predictions of SATD, the proposed method seems to be a good candidate to be adopted in practice.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords