A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection

Inderpreet Kaur; Arvinder Kaur

doi:10.1109/ACCESS.2021.3049823

IEEE Access (Jan 2021)

A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection

Inderpreet Kaur,
Arvinder Kaur

Affiliations

Inderpreet Kaur: ORCiD; University School of Information & Communication Technology, Guru Gobind Singh Indraprastha University, New Delhi, India
Arvinder Kaur: University School of Information & Communication Technology, Guru Gobind Singh Indraprastha University, New Delhi, India

DOI: https://doi.org/10.1109/ACCESS.2021.3049823
Journal volume & issue: Vol. 9
pp. 8695 – 8707

Abstract

Read online

Purpose: Code smells are residuals of technical debt induced by the developers. They hinder evolution, adaptability and maintenance of the software. Meanwhile, they are very beneficial in indicating the loopholes of problems and bugs in the software. Machine learning has been extensively used to predict Code Smells in research. The current study aims to optimise the prediction using Ensemble Learning and Feature Selection techniques on three open-source Java data sets. Design and Results: The work Compares four varied approaches to detect code smells using four performance measures Accuracy(P1), G-mean1 (P2), G-mean2 (P3), and F-measure (P4). The study found out that values of the performance measures did not degrade it instead of either remained same or increased with feature selection and Ensemble Learning. Random Forest turns out to be the best classifier while Correlation-based Feature selection(BFS) is best amongst Feature Selection techniques. Ensemble Learning aggregators, i.e. ET5C2 (BFS intersection Relief with classifier Random Forest), ET6C2 (BFS union Relief with classifier Random Forest), and ET5C1 (BFS intersection Relief with Bagging) and Majority Voting give best results from all the aggregation combinations studied. Conclusion: Though the results are good, but using Ensemble learning techniques needs a lot of validation for a variety of data sets before it can be standardised. The Ensemble Learning techniques also pose a challenge concerning diversity and reliability and hence needs exhaustive studies.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords