A deep learning model for DNA enhancer prediction based on nucleotide position aware feature encoding

Wenxing Hu; Yelin Li; Yan Wu; Lixin Guan; Mengshan Li

iScience (Jun 2024)

A deep learning model for DNA enhancer prediction based on nucleotide position aware feature encoding

Wenxing Hu,
Yelin Li,
Yan Wu,
Lixin Guan,
Mengshan Li

Affiliations

Wenxing Hu: College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
Yelin Li: College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
Yan Wu: College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
Lixin Guan: College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
Mengshan Li: College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China; Corresponding author

Journal volume & issue: Vol. 27, no. 6
p. 110030

Abstract

Read online

Summary: Enhancers, genomic DNA elements, regulate neighboring gene expression crucial for biological processes like cell differentiation and stress response. However, current machine learning methods for predicting DNA enhancers often underutilize hidden features in gene sequences, limiting model accuracy. Hence, this article proposes the PDCNN model, a deep learning-based enhancer prediction method. PDCNN extracts statistical nucleotide representations from gene sequences, discerning positional distribution information of nucleotides in modifier-like DNA sequences. With a convolutional neural network structure, PDCNN employs dual convolutional and fully connected layers. The cross-entropy loss function iteratively updates using a gradient descent algorithm, enhancing prediction accuracy. Model parameters are fine-tuned to select optimal combinations for training, achieving over 95% accuracy. Comparative analysis with traditional methods and existing models demonstrates PDCNN’s robust feature extraction capability. It outperforms advanced machine learning methods in identifying DNA enhancers, presenting an effective method with broad implications for genomics, biology, and medical research.

Published in iScience

ISSN: 2589-0042 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science
Website: http://www.cell.com/iscience/home

About the journal

Abstract

Keywords