BMC Bioinformatics (Feb 2022)

MCRWR: a new method to measure the similarity of documents based on semantic network

  • Xianwei Pan,
  • Peng Huang,
  • Shan Li,
  • Lei Cui

DOI
https://doi.org/10.1186/s12859-022-04578-1
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background Besides Boolean retrieval with medical subject headings (MeSH), PubMed provides users with an alternative way called “Related Articles” to access and collect relevant documents based on semantic similarity. To explore the functionality more efficiently and more accurately, we proposed an improved algorithm by measuring the semantic similarity of PubMed citations based on the MeSH-concept network model. Results Three article similarity networks are obtained using MeSH-concept random walk with restart (MCRWR), MeSH random walk with restart (MRWR) and PubMed related article (PMRA) respectively. The area under receiver operating characteristic (ROC) curve of MCRWR, MRWR and PMRA is 0.93, 0.90, and 0.67 respectively. Precisions of MCRWR and MRWR under various similarity thresholds are higher than that of PMRA. Mean value of P5 of MCRWR is 0.742, which is much higher than those of MRWR (0.692) and PMRA (0.223). In the article semantic similarity network of “Genes & Function of organ & Disease” based on MCRWR algorithm, four topics are identified according to golden standards. Conclusion MeSH-concept random walk with restart algorithm has better performance in constructing article semantic similarity network, which can reveal the implicitly semantic association between documents. The efficiency and accuracy of retrieving semantic-related documents have been improved a lot.

Keywords