Information Retrieval in an Infodemic: The Case of COVID-19 Publications

Douglas Teodoro; Sohrab Ferdowsi; Nikolay Borissov; Elham Kashani; David Vicente Alvarez; Jenny Copara; Racha Gouareb; Nona Naderi; Poorya Amini

doi:10.2196/30161

Journal of Medical Internet Research (Sep 2021)

Information Retrieval in an Infodemic: The Case of COVID-19 Publications

Douglas Teodoro,
Sohrab Ferdowsi,
Nikolay Borissov,
Elham Kashani,
David Vicente Alvarez,
Jenny Copara,
Racha Gouareb,
Nona Naderi,
Poorya Amini

Affiliations

Douglas Teodoro: ORCiD
Sohrab Ferdowsi: ORCiD
Nikolay Borissov: ORCiD
Elham Kashani: ORCiD
David Vicente Alvarez: ORCiD
Jenny Copara: ORCiD
Racha Gouareb: ORCiD
Nona Naderi: ORCiD
Poorya Amini: ORCiD

DOI: https://doi.org/10.2196/30161
Journal volume & issue: Vol. 23, no. 9
p. e30161

Abstract

Read online

BackgroundThe COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19–related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses. ObjectiveIn the context of searching for scientific evidence in the deluge of COVID-19–related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language. MethodsOur multistage retrieval methodology combines probabilistic weighting models and reranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries is compared to documents, and a series of postprocessing methods is applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents. ResultsThe methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed an Okapi Best Match 25–based baseline, retrieving on average, 83% of relevant documents in the top 20. ConclusionsThese results indicate that multistage retrieval supported by deep learning could enhance identification of literature for COVID-19–related questions posed using natural language.

Published in Journal of Medical Internet Research

ISSN: 1438-8871 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Medicine: Public aspects of medicine
Website: https://www.jmir.org

About the journal