BMC Bioinformatics (Nov 2021)

Bayesian variable selection for high-dimensional data with an ordinal response: identifying genes associated with prognostic risk group in acute myeloid leukemia

  • Yiran Zhang,
  • Kellie J. Archer

DOI
https://doi.org/10.1186/s12859-021-04432-w
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background Acute myeloid leukemia (AML) is a heterogeneous cancer of the blood, though specific recurring cytogenetic abnormalities in AML are strongly associated with attaining complete response after induction chemotherapy, remission duration, and survival. Therefore recurring cytogenetic abnormalities have been used to segregate patients into favorable, intermediate, and adverse prognostic risk groups. However, it is unclear how expression of genes is associated with these prognostic risk groups. We postulate that expression of genes monotonically associated with these prognostic risk groups may yield important insights into leukemogenesis. Therefore, in this paper we propose penalized Bayesian ordinal response models to predict prognostic risk group using gene expression data. We consider a double exponential prior, a spike-and-slab normal prior, a spike-and-slab double exponential prior, and a regression-based approach with variable inclusion indicators for modeling our high-dimensional ordinal response, prognostic risk group, and identify genes through hypothesis tests using Bayes factor. Results Gene expression was ascertained using Affymetrix HG-U133Plus2.0 GeneChips for 97 favorable, 259 intermediate, and 97 adverse risk AML patients. When applying our penalized Bayesian ordinal response models, genes identified for model inclusion were consistent among the four different models. Additionally, the genes included in the models were biologically plausible, as most have been previously associated with either AML or other types of cancer. Conclusion These findings demonstrate that our proposed penalized Bayesian ordinal response models are useful for performing variable selection for high-dimensional genomic data and have the potential to identify genes relevantly associated with an ordinal phenotype.

Keywords