Deep forecasting of translational impact in medical research

Amy P.K. Nelson; Robert J. Gray; James K. Ruffle; Henry C. Watkins; Daniel Herron; Nick Sorros; Danil Mikhailov; M. Jorge Cardoso; Sebastien Ourselin; Nick McNally; Bryan Williams; Geraint E. Rees; Parashkev Nachev

Patterns (May 2022)

Deep forecasting of translational impact in medical research

Amy P.K. Nelson,
Robert J. Gray,
James K. Ruffle,
Henry C. Watkins,
Daniel Herron,
Nick Sorros,
Danil Mikhailov,
M. Jorge Cardoso,
Sebastien Ourselin,
Nick McNally,
Bryan Williams,
Geraint E. Rees,
Parashkev Nachev

Affiliations

Amy P.K. Nelson: High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London WC1B 5EH, UK; Corresponding author
Robert J. Gray: High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London WC1B 5EH, UK
James K. Ruffle: High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London WC1B 5EH, UK
Henry C. Watkins: High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London WC1B 5EH, UK
Daniel Herron: Research & Development, NIHR University College London Hospitals Biomedical Research Centre, London WC1E 6BT, UK
Nick Sorros: Wellcome Data Labs, Wellcome Trust, London NW1 2BE, UK
Danil Mikhailov: Wellcome Data Labs, Wellcome Trust, London NW1 2BE, UK
M. Jorge Cardoso: School of Biomedical Engineering & Imaging Sciences, King’s College London, London WC2R 2LS, UK
Sebastien Ourselin: School of Biomedical Engineering & Imaging Sciences, King’s College London, London WC2R 2LS, UK
Nick McNally: Research & Development, NIHR University College London Hospitals Biomedical Research Centre, London WC1E 6BT, UK
Bryan Williams: Research & Development, NIHR University College London Hospitals Biomedical Research Centre, London WC1E 6BT, UK; UCL Institute of Cardiovascular Sciences, University College London, London WC1E 6BT, UK
Geraint E. Rees: High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London WC1B 5EH, UK; Faculty of Life Sciences, University College London, Gower Street, London WC1E 6BT, UK
Parashkev Nachev: High Dimensional Neurology Group, UCL Queen Square Institute of Neurology, University College London, Russell Square House, Bloomsbury, London WC1B 5EH, UK; Corresponding author

Journal volume & issue: Vol. 3, no. 5
p. 100483

Abstract

Read online

Summary: The value of biomedical research—a $1.7 trillion annual investment—is ultimately determined by its downstream, real-world impact, whose predictability from simple citation metrics remains unquantified. Here we sought to determine the comparative predictability of future real-world translation—as indexed by inclusion in patents, guidelines, or policy documents—from complex models of title/abstract-level content versus citations and metadata alone. We quantify predictive performance out of sample, ahead of time, across major domains, using the entire corpus of biomedical research captured by Microsoft Academic Graph from 1990–2019, encompassing 43.3 million papers. We show that citations are only moderately predictive of translational impact. In contrast, high-dimensional models of titles, abstracts, and metadata exhibit high fidelity (area under the receiver operating curve [AUROC] > 0.9), generalize across time and domain, and transfer to recognizing papers of Nobel laureates. We argue that content-based impact models are superior to conventional, citation-based measures and sustain a stronger evidence-based claim to the objective measurement of translational potential. The bigger picture: The relationship of scientific activity to real-world impact is hard to describe and even harder to quantify. Analyzing 43.3 million biomedical papers from 1990–2019, we show that deep learning models of publication, title, and abstract content can predict inclusion of a scientific paper in a patent, guideline, or policy document. We show that the best of these models, incorporating the richest information, substantially outperforms traditional metrics of paper success—citations per year—and transfers to the task of predicting Nobel Prize-preceding papers. If judgments of the translational potential of science are to be based on objective metrics, then complex models of paper content should be preferred over citations. Our approach is naturally extensible to richer scientific content and diverse measures of impact. Its wider application could maximize the real-world benefits of scientific activity in the biomedical realm and beyond.

DSML3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems

Published in Patterns

ISSN: 2666-3899 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://www.cell.com/patterns

About the journal

Abstract

Keywords