Communications Biology (Mar 2024)

Accurate prediction of RNA secondary structure including pseudoknots through solving minimum-cost flow with learned potentials

  • Tiansu Gong,
  • Fusong Ju,
  • Dongbo Bu

DOI
https://doi.org/10.1038/s42003-024-05952-w
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Pseudoknots are key structure motifs of RNA and pseudoknotted RNAs play important roles in a variety of biological processes. Here, we present KnotFold, an accurate approach to the prediction of RNA secondary structure including pseudoknots. The key elements of KnotFold include a learned potential function and a minimum-cost flow algorithm to find the secondary structure with the lowest potential. KnotFold learns the potential from the RNAs with known structures using an attention-based neural network, thus avoiding the inaccuracy of hand-crafted energy functions. The specially designed minimum-cost flow algorithm used by KnotFold considers all possible combinations of base pairs and selects from them the optimal combination. The algorithm breaks the restriction of nested base pairs required by the widely used dynamic programming algorithms, thus enabling the identification of pseudoknots. Using 1,009 pseudoknotted RNAs as representatives, we demonstrate the successful application of KnotFold in predicting RNA secondary structures including pseudoknots with accuracy higher than the state-of-the-art approaches. We anticipate that KnotFold, with its superior accuracy, will greatly facilitate the understanding of RNA structures and functionalities.