Computer Methods and Programs in Biomedicine Update (Jan 2021)

Use and validation of text mining and cluster algorithms to derive insights from Corona Virus Disease-2019 (COVID-19) medical literature

  • Sandeep Reddy,
  • Ravi Bhaskar,
  • Sandosh Padmanabhan,
  • Karin Verspoor,
  • Chaitanya Mamillapalli,
  • Rani Lahoti,
  • Ville-Petteri Makinen,
  • Smitan Pradhan,
  • Puru Kushwah,
  • Saumya Sinha

Journal volume & issue
Vol. 1
p. 100010

Abstract

Read online

The emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) late last year has not only led to the world-wide coronavirus disease 2019 (COVID-19) pandemic but also a deluge of biomedical literature. Following the release of the COVID-19 open research dataset (CORD-19) comprising over 200,000 scholarly articles, we a multi-disciplinary team of data scientists, clinicians, medical researchers and software engineers developed an innovative natural language processing (NLP) platform that combines an advanced search engine with a biomedical named entity recognition extraction package. In particular, the platform was developed to extract information relating to clinical risk factors for COVID-19 by presenting the results in a cluster format to support knowledge discovery. Here we describe the principles behind the development, the model and the results we obtained.

Keywords