Biology (Apr 2024)
PseUpred-ELPSO Is an Ensemble Learning Predictor with Particle Swarm Optimizer for Improving the Prediction of RNA Pseudouridine Sites
Abstract
RNA pseudouridine modification exists in different RNA types of many species, and it has a significant role in regulating the expression of biological processes. To understand the functional mechanisms for RNA pseudouridine sites, the accurate identification of pseudouridine sites in RNA sequences is essential. Although several fast and inexpensive computational methods have been proposed, the challenge of improving recognition accuracy and generalization still exists. This study proposed a novel ensemble predictor called PseUpred-ELPSO for improved RNA pseudouridine site prediction. After analyzing the nucleotide composition preferences between RNA pseudouridine site sequences, two feature representations were determined and fed into the stacking ensemble framework. Then, using five tree-based machine learning classifiers as base classifiers, 30-dimensional RNA profiles are constructed to represent RNA sequences, and using the PSO algorithm, the weights of the RNA profiles were searched to further enhance the representation. A logistic regression classifier was used as a meta-classifier to complete the final predictions. Compared to the most advanced predictors, the performance of PseUpred-ELPSO is superior in both cross-validation and the independent test. Based on the PseUpred-ELPSO predictor, a free and easy-to-operate web server has been established, which will be a powerful tool for pseudouridine site identification.
Keywords