BP-GAN: Interpretable Human Branchpoint Prediction Using Attentive Generative Adversarial Networks

Hyeonseok Lee; Sangwoo Yeom; Sungchan Kim

doi:10.1109/ACCESS.2020.2995762

IEEE Access (Jan 2020)

BP-GAN: Interpretable Human Branchpoint Prediction Using Attentive Generative Adversarial Networks

Hyeonseok Lee,
Sangwoo Yeom,
Sungchan Kim

Affiliations

Hyeonseok Lee: ORCiD; Division of Computer Science and Engineering, Jeonbuk National University, Jeonju, South Korea
Sangwoo Yeom: ORCiD; Division of Computer Science and Engineering, Jeonbuk National University, Jeonju, South Korea
Sungchan Kim: ORCiD; Division of Computer Science and Engineering, Jeonbuk National University, Jeonju, South Korea

DOI: https://doi.org/10.1109/ACCESS.2020.2995762
Journal volume & issue: Vol. 8
pp. 97851 – 97862

Abstract

Read online

Branchpoints (BPs) are essential sequence elements of ribonucleic acids (RNAs) in splicing, which is the process of creating a messenger RNA (mRNA) that is translated into proteins. This study proposes to develop deep neural networks for BP prediction. Extensive previous studies have shown that the existence of BP sites depends on sequence patterns called motifs; hence, the prediction model must accurately explain its decisions in terms of motifs. Existing approaches utilized either handcrafted features for interpretable, though less accurate, predictions or deep neural networks that were accurate but difficult to explain. To address the aforementioned difficulties, the proposed method incorporates 1) generative adversarial networks (GANs) to learn the latent structure of RNA sequences, and 2) an attention mechanism to learn sequence-positional long-term dependency for accurate prediction and interpretation. Our method achieves highly satisfying results in various performance metrics with adequate interpretability. We demonstrated that, without any prior biological knowledge, BP prediction by the proposed method is closely related to three motifs, the consensus sequence surrounding BPs, polypyrimidine tract, and 3' splice site, that are well-established in molecular biology.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords