Nature Communications (Jun 2024)

Restricting datasets to classifiable samples augments discovery of immune disease biomarkers

  • Gunther Glehr,
  • Paloma Riquelme,
  • Katharina Kronenberg,
  • Robert Lohmayer,
  • Víctor J. López-Madrona,
  • Michael Kapinsky,
  • Hans J. Schlitt,
  • Edward K. Geissler,
  • Rainer Spang,
  • Sebastian Haferkamp,
  • James A. Hutchinson

DOI
https://doi.org/10.1038/s41467-024-49094-3
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Immunological diseases are typically heterogeneous in clinical presentation, severity and response to therapy. Biomarkers of immune diseases often reflect this variability, especially compared to their regulated behaviour in health. This leads to a common difficulty that frustrates biomarker discovery and interpretation – namely, unequal dispersion of immune disease biomarker expression between patient classes necessarily limits a biomarker’s informative range. To solve this problem, we introduce dataset restriction, a procedure that splits datasets into classifiable and unclassifiable samples. Applied to synthetic flow cytometry data, restriction identifies biomarkers that are otherwise disregarded. In advanced melanoma, restriction finds biomarkers of immune-related adverse event risk after immunotherapy and enables us to build multivariate models that accurately predict immunotherapy-related hepatitis. Hence, dataset restriction augments discovery of immune disease biomarkers, increases predictive certainty for classifiable samples and improves multivariate models incorporating biomarkers with a limited informative range. This principle can be directly extended to any classification task.