Scientific Reports (Aug 2021)
Early prognosis of respiratory virus shedding in humans
Abstract
Abstract This paper addresses the development of predictive models for distinguishing pre-symptomatic infections from uninfected individuals. Our machine learning experiments are conducted on publicly available challenge studies that collected whole-blood transcriptomics data from individuals infected with HRV, RSV, H1N1, and H3N2. We address the problem of identifying discriminatory biomarkers between controls and eventual shedders in the first 32 h post-infection. Our exploratory analysis shows that the most discriminatory biomarkers exhibit a strong dependence on time over the course of the human response to infection. We visualize the feature sets to provide evidence of the rapid evolution of the gene expression profiles. To quantify this observation, we partition the data in the first 32 h into four equal time windows of 8 h each and identify all discriminatory biomarkers using sparsity-promoting classifiers and Iterated Feature Removal. We then perform a comparative machine learning classification analysis using linear support vector machines, artificial neural networks and Centroid-Encoder. We present a range of experiments on different groupings of the diseases to demonstrate the robustness of the resulting models.