A Novel Deep Neural Networks Model Based on Prime Numbers for Y DNA Haplogroup Prediction

Jasbir Dhaliwal; Keong Jin; Zhe Jin

doi:10.1109/ACCESS.2020.3022274

IEEE Access (Jan 2020)

A Novel Deep Neural Networks Model Based on Prime Numbers for Y DNA Haplogroup Prediction

Jasbir Dhaliwal,
Keong Jin,
Zhe Jin

Affiliations

Jasbir Dhaliwal: ORCiD; School of Information Technology, Monash University Malaysia, Bandar Sunway, Malaysia
Keong Jin: ORCiD; School of Information Technology, Monash University Malaysia, Bandar Sunway, Malaysia
Zhe Jin: ORCiD; School of Information Technology, Monash University Malaysia, Bandar Sunway, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2020.3022274
Journal volume & issue: Vol. 8
pp. 169096 – 169105

Abstract

Read online

Most of the Y chromosome (Ychr) region (approximately 95%) passes unchanged from father to son, except by the gradual accumulation of single-nucleotide polymorphism (SNP) mutations. This results in mutations being inherited together, where all males in the direct family will have an identical pattern of variations. These mutation patterns serve as markers and can be mapped into clusters known as Y DNA haplogroups. Besides lineage tracing, haplogroups have been associated with male infertility, semen parameters, and, more recently, disease progression in several populations. Thus, haplogroup prediction research is gaining importance because of the increasing interest in personalized medicine. Of note, there are two approaches to predicting haplogroups, where the difference lies in the genetic markers: short tandem repeats (STRs) or SNPs are inputs to the haplogroup prediction tools. STRs are not without limitations, as similar STR haplotypes exist between haplogroups, and this reduces the effectiveness of STR-based haplogroup prediction tools. By contrast, current SNP-based haplogroup prediction tools are computationally expensive. There have been no studies to date that leverage traditional machine learning and deep learning algorithms to identify mutation patterns using SNPs only, and this paper proposes a novel SNP-based deep neural networks (DNNs) model. However, DNNs suffer the curse of dimensionality and become computationally expensive with large datasets. Thus, this paper overcomes the limitation of the network by proposing a novel feature extraction method based on prime numbers that computes features in either the forward or reverse direction of the SNPs data. Our experimental results show that the model achieves a categorical cross-entropy loss value as low as 0.001 on the training dataset and as low as 0.039 on the test dataset.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords