PLoS ONE (Jan 2011)

Finding complex biological relationships in recent PubMed articles using Bio-LDA.

  • Huijun Wang,
  • Ying Ding,
  • Jie Tang,
  • Xiao Dong,
  • Bing He,
  • Judy Qiu,
  • David J Wild

DOI
https://doi.org/10.1371/journal.pone.0017243
Journal volume & issue
Vol. 6, no. 3
p. e17243

Abstract

Read online

The overwhelming amount of available scholarly literature in the life sciences poses significant challenges to scientists wishing to keep up with important developments related to their research, but also provides a useful resource for the discovery of recent information concerning genes, diseases, compounds and the interactions between them. In this paper, we describe an algorithm called Bio-LDA that uses extracted biological terminology to automatically identify latent topics, and provides a variety of measures to uncover putative relations among topics and bio-terms. Relationships identified using those approaches are combined with existing data in life science datasets to provide additional insight. Three case studies demonstrate the utility of the Bio-LDA model, including association predication, association search and connectivity map generation. This combined approach offers new opportunities for knowledge discovery in many areas of biology including target identification, lead hopping and drug repurposing.