BMC Genomics (Aug 2017)
Fungal biomarker discovery by integration of classifiers
Abstract
Abstract Background The human immune system is responsible for protecting the host from infection. However, in immunocompromised individuals the risk of infection increases substantially with possible drastic consequences. In extreme, systemic infection can lead to sepsis which is responsible for innumerous deaths worldwide. Amongst its causes are infections by bacteria and fungi. To increase survival, it is mandatory to identify the type of infection rapidly. Discriminating between fungal and bacterial pathogens is key to determine if antifungals or antibiotics should be administered, respectively. For this, in situ experiments have been performed to determine regulation mechanisms of the human immune system to identify biomarkers. However, these studies led to heterogeneous results either due different laboratory settings, pathogen strains, cell types and tissues, as well as the time of sample extraction, to name a few. Methods To generate a gene signature capable of discriminating between fungal and bacterial infected samples, we employed Mixed Integer Linear Programming (MILP) based classifiers on several datasets comprised of the above mentioned pathogens. Results When combining the classifiers by a joint optimization we could increase the consistency of the biomarker gene list independently of the experimental setup. An increase in pairwise overlap (the number of genes that overlap in each cross-validation) of 43% was obtained by this approach when compared to that of single classifiers. The refined gene list was composed of 19 genes and ranked according to consistency in expression (up- or down-regulated) and most of them were linked either directly or indirectly to the ERK-MAPK signalling pathway, which has been shown to play a key role in the immune response to infection. Testing of the identified 12 genes on an unseen dataset yielded an average accuracy of 83%. Conclusions In conclusion, our method allowed the combination of independent classifiers and increased consistency and reliability of the generated gene signatures.
Keywords