Jisuanji kexue (Feb 2022)
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
Abstract
Long non-coding RNA (lncRNA) plays an important role in various complex human diseases.The development of effective prediction methods to infer the potential associations between lncRNA and diseases will not only help biologists understand the pathogenesis of diseases,but also contribute to the diagnosis,prevention,and treatment of human diseases.In this paper,an ensemble regression decision tree-based lncRNA-disease association method (ERDTLDA) is proposed to solve the lncRNA-disease association problem.First,ERDTLDA uses the open-source data of lncRNA to construct lncRNA,disease similarity matrix,lncRNA-disease association matrix respectively.Then,we obtain lncRNA,disease feature representations from these matrices.Principal component analysis is further exploited for feature extraction.Finally,a CART regression decision tree is used to yield association scores.An ensemble strategy for multiple decision trees is proposed to further improve the accuracy of our model.The results of LOOCV experiments show that the AUC of our method on three real lncRNA-disease datasets are 0.905 5,0.896 9 and 0.912 9 respectively,which are 6.46%,5.4% and 6.02% higher than the existing methods,respectively.Additionally,breast cancer,lung cancer,and gastric cancer are also used as case studies to further verify the accuracy and effectiveness of ERDTLDA.
Keywords