Machine learning approaches to genome-wide association studies

David O. Enoma; Janet Bishung; Theresa Abiodun; Olubanke Ogunlana; Victor Chukwudi Osamor

Journal of King Saud University: Science (Jun 2022)

Machine learning approaches to genome-wide association studies

David O. Enoma,
Janet Bishung,
Theresa Abiodun,
Olubanke Ogunlana,
Victor Chukwudi Osamor

Affiliations

David O. Enoma: Department of Computer and Information Sciences, Covenant University, Ota, Nigeria; Covenant Applied Informatics and Communication African Centre of Excellence, Covenant University, Nigeria
Janet Bishung: Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
Theresa Abiodun: Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
Olubanke Ogunlana: Covenant Applied Informatics and Communication African Centre of Excellence, Covenant University, Nigeria; Department of Biochemistry, Covenant University, Ota, Nigeria
Victor Chukwudi Osamor: Department of Computer and Information Sciences, Covenant University, Ota, Nigeria; Covenant Applied Informatics and Communication African Centre of Excellence, Covenant University, Nigeria; Corresponding author at: Department of Computer and Information Sciences, Covenant University, Ota, Nigeria.

Journal volume & issue: Vol. 34, no. 4
p. 101847

Abstract

Read online

Genome-wide Association Studies (GWAS) are conducted to identify single nucleotide polymorphisms (variants) associated with a phenotype within a specific population. These variants associated with diseases have a complex molecular aetiology with which they cause the disease phenotype. The genotyping data generated from subjects of study is of high dimensionality, which is a challenge. The problem is that the dataset has a large number of features and a relatively smaller sample size. However, statistical testing is the standard approach being applied to identify these variants that influence the phenotype of interest. The wide applications and abilities of Machine Learning (ML) algorithms promise to understand the effects of these variants better. The aim of this work is to discuss the applications and future trends of ML algorithms in GWAS towards understanding the effects of population genetic variant. It was discovered that algorithms such as classification, regression, ensemble, and neural networks have been applied to GWAS for which this work has further discussed comprehensively including their application areas. The ML algorithms have been applied to the identification of significant single nucleotide polymorphisms (SNP), disease risk assessment & prediction, detection of epistatic non-linear interaction, and integrated with other omics sets. This comprehensive review has highlighted these areas of application and sheds light on the promise of innovating machine learning algorithms into the computational and statistical pipeline of genome-wide association studies. This will be beneficial for better understanding of how variants are affected by disease biology and how the same variants can influence risk by developing a particular phenotype for favourable natural selection.

Published in Journal of King Saud University: Science

ISSN: 1018-3647 (Print)
Publisher: Elsevier
Country of publisher: Saudi Arabia
LCC subjects: Science: Science (General)
Website: http://www.journals.elsevier.com/journal-of-king-saud-university-science/

About the journal

Abstract

Keywords