BMC Bioinformatics (Oct 2008)

Gene and pathway identification with <it>L</it><sub><it>p </it></sub>penalized Bayesian logistic regression

  • Jiang Feng,
  • Tan Ming,
  • Gartenhaus Ronald B,
  • Liu Zhenqiu,
  • Jiao Xiaoli

DOI
https://doi.org/10.1186/1471-2105-9-412
Journal volume & issue
Vol. 9, no. 1
p. 412

Abstract

Read online

Abstract Background Identifying genes and pathways associated with diseases such as cancer has been a subject of considerable research in recent years in the area of bioinformatics and computational biology. It has been demonstrated that the magnitude of differential expression does not necessarily indicate biological significance. Even a very small change in the expression of particular gene may have dramatic physiological consequences if the protein encoded by this gene plays a catalytic role in a specific cell function. Moreover, highly correlated genes may function together on the same pathway biologically. Finally, in sparse logistic regression with Lp (p Results In this paper, we proposed a simple Bayesian approach to integrate the regularization parameter out analytically using a new prior. Therefore, there is no longer a need for parameter selection, as it is eliminated entirely from the model. The proposed algorithm (BLpLog) is typically two or three orders of magnitude faster than the original algorithm and free from bias in performance estimation. We also define a novel similarity measure and develop an integrated algorithm to hunt the regulatory genes with low expression changes but having high correlation with the selected genes. Pathways of those correlated genes were identified with DAVID http://david.abcc.ncifcrf.gov/. Conclusion Experimental results with gene expression data demonstrate that the proposed methods can be utilized to identify important genes and pathways that are related to cancer and build a parsimonious model for future patient predictions.