FedCSD: A Federated Learning Based Approach for Code-Smell Detection

Sadi Alawadi; Khalid Alkharabsheh; Fahed Alkhabbas; Victor R. Kebande; Feras M. Awaysheh; Fabio Palomba; Mohammed Awad

doi:10.1109/ACCESS.2024.3380167

IEEE Access (Jan 2024)

FedCSD: A Federated Learning Based Approach for Code-Smell Detection

Sadi Alawadi,
Khalid Alkharabsheh,
Fahed Alkhabbas,
Victor R. Kebande,
Feras M. Awaysheh,
Fabio Palomba,
Mohammed Awad

Affiliations

Sadi Alawadi: ORCiD; Department of Computer Science, Blekinge Institute of Technology, Karlskrona, Sweden
Khalid Alkharabsheh: ORCiD; Software Engineering Department, Prince Abdullah bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, As-Salt, Jordan
Fahed Alkhabbas: ORCiD; Internet of Things and People Research Center, Malmö University, Malmö, Sweden
Victor R. Kebande: ORCiD; Department of Computer Science, Blekinge Institute of Technology, Karlskrona, Sweden
Feras M. Awaysheh: Institute of Computer Science, Delta Research Centre, University of Tartu, Tartu, Estonia
Fabio Palomba: ORCiD; Department of Computer Science, University of Salerno, Fisciano, Italy
Mohammed Awad: ORCiD; Department of Computer Systems Engineering, Arab American University, Jenin, Palestine

DOI: https://doi.org/10.1109/ACCESS.2024.3380167
Journal volume & issue: Vol. 12
pp. 44888 – 44904

Abstract

Read online

Software quality is critical, as low quality, or “Code smell,” increases technical debt and maintenance costs. There is a timely need for a collaborative model that detects and manages code smells by learning from diverse and distributed data sources while respecting privacy and providing a scalable solution for continuously integrating new patterns and practices in code quality management. However, the current literature is still missing such capabilities. This paper addresses the previous challenges by proposing a Federated Learning Code Smell Detection (FedCSD) approach, specifically targeting “God Class,” to enable organizations to train distributed ML models while safeguarding data privacy collaboratively. We conduct experiments using manually validated datasets to detect and analyze code smell scenarios to validate our approach. Experiment 1, a centralized training experiment, revealed varying accuracies across datasets, with dataset two achieving the lowest accuracy (92.30%) and datasets one and three achieving the highest (98.90% and 99.5%, respectively). Experiment 2, focusing on cross-evaluation, showed a significant drop in accuracy (lowest: 63.80%) when fewer smells were present in the training dataset, reflecting technical debt. Experiment 3 involved splitting the dataset across 10 companies, resulting in a global model accuracy of 98.34%, comparable to the centralized model’s highest accuracy. The application of federated ML techniques demonstrates promising performance improvements in code-smell detection, benefiting both software developers and researchers.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords