BMC Bioinformatics (Jul 2021)

ANMDA: anti-noise based computational model for predicting potential miRNA-disease associations

  • Xue-Jun Chen,
  • Xin-Yun Hua,
  • Zhen-Ran Jiang

DOI
https://doi.org/10.1186/s12859-021-04266-6
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background A growing proportion of research has proved that microRNAs (miRNAs) can regulate the function of target genes and have close relations with various diseases. Developing computational methods to exploit more potential miRNA-disease associations can provide clues for further functional research. Results Inspired by the work of predecessors, we discover that the noise hiding in the data can affect the prediction performance and then propose an anti-noise algorithm (ANMDA) to predict potential miRNA-disease associations. Firstly, we calculate the similarity in miRNAs and diseases to construct features and obtain positive samples according to the Human MicroRNA Disease Database version 2.0 (HMDD v2.0). Then, we apply k-means on the undetected miRNA-disease associations and sample the negative examples equally from the k-cluster. Further, we construct several data subsets through sampling with replacement to feed on the light gradient boosting machine (LightGBM) method. Finally, the voting method is applied to predict potential miRNA-disease relationships. As a result, ANMDA can achieve an area under the receiver operating characteristic curve (AUROC) of 0.9373 ± 0.0005 in five-fold cross-validation, which is superior to several published methods. In addition, we analyze the predicted miRNA-disease associations with high probability and compare them with the data in HMDD v3.0 in the case study. The results show ANMDA is a novel and practical algorithm that can be used to infer potential miRNA-disease associations. Conclusion The results indicate the noise hiding in the data has an obvious impact on predicting potential miRNA-disease associations. We believe ANMDA can achieve better results from this task with more methods used in dealing with the data noise.

Keywords