Advanced Science (Sep 2024)
Cross‐Species Prediction of Transcription Factor Binding by Adversarial Training of a Novel Nucleotide‐Level Deep Neural Network
Abstract
Abstract Cross‐species prediction of TF binding remains a major challenge due to the rapid evolutionary turnover of individual TF binding sites, resulting in cross‐species predictive performance being consistently worse than within‐species performance. In this study, a novel Nucleotide‐Level Deep Neural Network (NLDNN) is first proposed to predict TF binding within or across species. NLDNN regards the task of TF binding prediction as a nucleotide‐level regression task, which takes DNA sequences as input and directly predicts experimental coverage values. Beyond predictive performance, it also assesses model performance by locating potential TF binding regions, discriminating TF‐specific single‐nucleotide polymorphisms (SNPs), and identifying causal disease‐associated SNPs. The experimental results show that NLDNN outperforms the competing methods in these tasks. Then, a dual‐path framework is designed for adversarial training of NLDNN to further improve the cross‐species prediction performance by pulling the domain space of human and mouse species closer. Through comparison and analysis, it finds that adversarial training not only can improve the cross‐species prediction performance between humans and mice but also enhance the ability to locate TF binding regions and discriminate TF‐specific SNPs. By visualizing the predictions, it is figured out that the framework corrects some mispredictions by amplifying the coverage values of incorrectly predicted peaks.
Keywords