BMC Bioinformatics (Feb 2020)

GenEpi: gene-based epistasis discovery using machine learning

  • Yu-Chuan Chang,
  • June-Tai Wu,
  • Ming-Yi Hong,
  • Yi-An Tung,
  • Ping-Han Hsieh,
  • Sook Wah Yee,
  • Kathleen M. Giacomini,
  • Yen-Jen Oyang,
  • Chien-Yu Chen,
  • for the Alzheimer’s Disease Neuroimaging Initiative

DOI
https://doi.org/10.1186/s12859-020-3368-2
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Genome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer’s disease (AD). Results In this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power. Conclusions The results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future.

Keywords