Scientific Reports (Mar 2025)

A privacy-preserving dependable deep federated learning model for identifying new infections from genome sequences

  • Sk. Tanzir Mehedi,
  • Lway Faisal Abdulrazak,
  • Kawsar Ahmed,
  • Muhammad Shahin Uddin,
  • Francis M. Bui,
  • Li Chen,
  • Mohammad Ali Moni,
  • Fahad Ahmed Al-Zahrani

DOI
https://doi.org/10.1038/s41598-025-89612-x
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 24

Abstract

Read online

Abstract The traditional molecular-based identification (TMID) technique of new infections from genome sequences (GSs) has made significant contributions so far. However, due to the sensitive nature of the medical data, the TMID technique of transferring the patient’s data to the central machine or server may create severe privacy and security issues. In recent years, the progression of deep federated learning (DFL) and its remarkable success in many domains has guided as a potential solution in this field. Therefore, we proposed a dependable and privacy-preserving DFL-based identification model of new infections from GSs. The unique contributions include automatic effective feature selection, which is best suited for identifying new infections, designing a dependable and privacy-preserving DFL-based LeNet model, and evaluating real-world data. To this end, a comprehensive experimental performance evaluation has been conducted. Our proposed model has an overall accuracy of 99.12% after independently and identically distributing the dataset among six clients. Moreover, the proposed model has a precision of 98.23%, recall of 98.04%, f1-score of 96.24%, Cohen’s kappa of 83.94%, and ROC AUC of 98.24% for the same configuration, which is a noticeable improvement when compared to the other benchmark models. The proposed dependable model, along with empirical results, is encouraging enough to recognize as an alternative for identifying new infections from other virus strains by ensuring proper privacy and security of patients’ data.

Keywords