IEEE Access (Jan 2024)
An Evaluation of Multi-Label Classification Approaches for Method-Level Code Smells Detection
Abstract
(1) Background: Code smell is the most popular and reliable method for detecting potential errors in code. In real-world circumstances, a single source code may have multiple code smells. Multi-label code smell detection is a popular research study. However, limited studies are available on it, and there is a need for a standardized classifier for reliably identifying various multi-label code smells that belong to the method-level code smell category. The primary goal of this study is to develop a rule-based method for detecting multi-label code smells. (2) Methods: Binary Relevance, Label Powerset, and Classifier Chain methods are utilized with tree based single-label algorithms, including some ensemble algorithms in this research paper. The chi-square feature selection technique is applied to select relevant features. The proposed model is trained using 10-fold cross-validation, Random Search cross-validation parameter tuning, and different performance measures are used to evaluate the model. (3) Results: The proposed model achieves 99.54% of the best jaccard accuracy for detecting method-level code smells using the Classifier Chain method with the Decision Tree. The Decision Tree model incorporating a multi-label classifier outperforms alternative approaches to multi-label classification. Single-label classifiers produced better results after considering the correlation factor. (4) Conclusion: This study will facilitate scientists and programmers by providing a systematic method for detecting various code smells in software projects and saving time and effort during code reviews by detecting multiple problems simultaneously. After detecting multi-label code smell, programmers can create more organized, easier-to-understand, and trustworthy programs.
Keywords