Applied Sciences (Jul 2024)

Machine Learning-Based Methods for Code Smell Detection: A Survey

  • Pravin Singh Yadav,
  • Rajwant Singh Rao,
  • Alok Mishra,
  • Manjari Gupta

DOI
https://doi.org/10.3390/app14146149
Journal volume & issue
Vol. 14, no. 14
p. 6149

Abstract

Read online

Code smells are early warning signs of potential issues in software quality. Various techniques are used in code smell detection, including the Bayesian approach, rule-based automatic antipattern detection, antipattern identification utilizing B-splines, Support Vector Machine direct, SMURF (Support Vector Machines for design smell detection using relevant feedback), and immune-based detection strategy. Machine learning (ML) has taken a great stride in this area. This study includes relevant studies applying ML algorithms from 2005 to 2024 in a comprehensive manner for the survey to provide insight regarding code smell, ML algorithms frequently applied, and software metrics. Forty-two pertinent studies allow us to assess the efficacy of ML algorithms on selected datasets. After evaluating various studies based on open-source and project datasets, this study evaluated additional threats and obstacles to code smell detection, such as the lack of standardized code smell definitions, the difficulty of feature selection, and the challenges of handling large-scale datasets. The current studies only considered a few factors in identifying code smells, while in this study, several potential contributing factors to code smells are included. Several ML algorithms are examined, and various approaches, datasets, dataset languages, and software metrics are presented. This study provides the potential of ML algorithms to produce better results and fills a gap in the body of knowledge by providing class-wise distributions of the ML algorithms. Support Vector Machine, J48, Naive Bayes, and Random Forest models are the most common for detecting code smells. Researchers can find this study helpful in better anticipating and taking care of software development design and implementation issues. The findings from this study, which highlight the practical implications of ML algorithms in software quality improvement, will help software engineers fix problems during software design and development to ensure software quality.

Keywords