Diagnostics (Oct 2024)

Enhancing Influenza Detection through Integrative Machine Learning and Nasopharyngeal Metabolomic Profiling: A Comprehensive Study

  • Md. Shaheenur Islam Sumon,
  • Md Sakib Abrar Hossain,
  • Haya Al-Sulaiti,
  • Hadi M. Yassine,
  • Muhammad E. H. Chowdhury

DOI
https://doi.org/10.3390/diagnostics14192214
Journal volume & issue
Vol. 14, no. 19
p. 2214

Abstract

Read online

Background/Objectives: Nasal and nasopharyngeal swabs are commonly used for detecting respiratory viruses, including influenza, which significantly alters host cell metabolites. This study aimed to develop a machine learning model to identify biomarkers that differentiate between influenza-positive and -negative cases using clinical metabolomics data. Method: A publicly available dataset of 236 nasopharyngeal samples screened via liquid chromatography–quadrupole time-of-flight (LC/Q-TOF) mass spectrometry was used. Among these, 118 samples tested positive for influenza (40 A H1N1, 39 A H3N2, 39 Influenza B), while 118 were negative controls. A stacking-based model was proposed using the top 20 selected features. Thirteen machine learning models were initially trained, and the top three were combined using predicted probabilities to form a stacking classifier. Results: The ExtraTrees stacking model outperformed other models, achieving 97.08% accuracy. External validation on a prospective cohort of 96 symptomatic individuals (48 positive and 48 negatives for influenza) showed 100% accuracy. SHAP values were used to enhance model explainability. Metabolites such as Pyroglutamic Acid (retention time: 0.81 min, m/z: 84.0447) and its in-source fragment ion (retention time: 0.81 min, m/z: 130.0507) showed minimal impact on influenza-positive cases. On the other hand, metabolites with a retention time of 10.34 min and m/z 106.0865, and a retention time of 8.65 min and m/z 211.1376, demonstrated significant positive contributions. Conclusions: This study highlights the effectiveness of integrating metabolomics data with machine learning for accurate influenza diagnosis. The stacking-based model, combined with SHAP analysis, provided robust performance and insights into key metabolites influencing predictions.

Keywords