Diagnostics (Jun 2024)

Application of a Machine Learning-Based Classification Approach for Developing Host Protein Diagnostic Models for Infectious Disease

  • Thomas F. Scherr,
  • Christina E. Douglas,
  • Kurt E. Schaecher,
  • Randal J. Schoepp,
  • Keersten M. Ricks,
  • Charles J. Shoemaker

DOI
https://doi.org/10.3390/diagnostics14121290
Journal volume & issue
Vol. 14, no. 12
p. 1290

Abstract

Read online

In recent years, infectious disease diagnosis has increasingly turned to host-centered approaches as a complement to pathogen-directed ones. The former, however, typically requires the interpretation of complex multiple biomarker datasets to arrive at an informative diagnostic outcome. This report describes a machine learning (ML)-based classification workflow that is intended as a template for researchers seeking to apply ML approaches for developing host-based infectious disease biomarker classifiers. As an example, we built a classification model that could accurately distinguish between three disease etiology classes: bacterial, viral, and normal in human sera using host protein biomarkers of known diagnostic utility. After collecting protein data from known disease samples, we trained a series of increasingly complex Auto-ML models until arriving at an optimized classifier that could differentiate viral, bacterial, and non-disease samples. Even when limited to a relatively small training set size, the model had robust diagnostic characteristics and performed well when faced with a blinded sample set. We present here a flexible approach for applying an Auto-ML-based workflow for the identification of host biomarker classifiers with diagnostic utility for infectious disease, and which can readily be adapted for multiple biomarker classes and disease states.

Keywords