BMC Bioinformatics (Mar 2020)

Robust pathway sampling in phenotype prediction. Application to triple negative breast cancer

  • Ana Cernea,
  • Juan Luis Fernández-Martínez,
  • Enrique J. deAndrés-Galiana,
  • Francisco Javier Fernández-Ovies,
  • Oscar Alvarez-Machancoses,
  • Zulima Fernández-Muñiz,
  • Leorey N. Saligan,
  • Stephen T. Sonis

DOI
https://doi.org/10.1186/s12859-020-3356-6
Journal volume & issue
Vol. 21, no. S2
pp. 1 – 13

Abstract

Read online

Abstract Background Phenotype prediction problems are usually considered ill-posed, as the amount of samples is very limited with respect to the scrutinized genetic probes. This fact complicates the sampling of the defective genetic pathways due to the high number of possible discriminatory genetic networks involved. In this research, we outline three novel sampling algorithms utilized to identify, classify and characterize the defective pathways in phenotype prediction problems, such as the Fisher’s ratio sampler, the Holdout sampler and the Random sampler, and apply each one to the analysis of genetic pathways involved in tumor behavior and outcomes of triple negative breast cancers (TNBC). Altered biological pathways are identified using the most frequently sampled genes and are compared to those obtained via Bayesian Networks (BNs). Results Random, Fisher’s ratio and Holdout samplers were more accurate and robust than BNs, while providing comparable insights about disease genomics. Conclusions The three samplers tested are good alternatives to Bayesian Networks since they are less computationally demanding algorithms. Importantly, this analysis confirms the concept of “biological invariance” since the altered pathways should be independent of the sampling methodology and the classifier used for their inference. Nevertheless, still some modifications are needed in the Bayesian networks to be able to sample correctly the uncertainty space in phenotype prediction problems, since the probabilistic parameterization of the uncertainty space is not unique and the use of the optimum network might falsify the pathways analysis.