PLoS Computational Biology (Nov 2024)

GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies.

  • Lin Lin,
  • Rachel L Spreng,
  • Kelly E Seaton,
  • S Moses Dennison,
  • Lindsay C Dahora,
  • Daniel J Schuster,
  • Sheetal Sawant,
  • Peter B Gilbert,
  • Youyi Fong,
  • Neville Kisalu,
  • Andrew J Pollard,
  • Georgia D Tomaras,
  • Jia Li

DOI
https://doi.org/10.1371/journal.pcbi.1012581
Journal volume & issue
Vol. 20, no. 11
p. e1012581

Abstract

Read online

Despite significant progress in vaccine research, the level of protection provided by vaccination can vary significantly across individuals. As a result, understanding immunologic variation across individuals in response to vaccination is important for developing next-generation efficacious vaccines. Accurate outcome prediction and identification of predictive biomarkers would represent a significant step towards this goal. Moreover, in early phase vaccine clinical trials, small datasets are prevalent, raising the need and challenge of building a robust and explainable prediction model that can reveal heterogeneity in small datasets. We propose a new model named Generative Mixture of Logistic Regression (GeM-LR), which combines characteristics of both a generative and a discriminative model. In addition, we propose a set of model selection strategies to enhance the robustness and interpretability of the model. GeM-LR extends a linear classifier to a non-linear classifier without losing interpretability and empowers the notion of predictive clustering for characterizing data heterogeneity in connection with the outcome variable. We demonstrate the strengths and utility of GeM-LR by applying it to data from several studies. GeM-LR achieves better prediction results than other popular methods while providing interpretations at different levels.