A survey on clinical natural language processing in the United Kingdom from 2007 to 2022

Honghan Wu; Minhong Wang; Jinge Wu; Farah Francis; Yun-Hsuan Chang; Alex Shavick; Hang Dong; Michael T. C. Poon; Natalie Fitzpatrick; Adam P. Levine; Luke T. Slater; Alex Handy; Andreas Karwath; Georgios V. Gkoutos; Claude Chelala; Anoop Dinesh Shah; Robert Stewart; Nigel Collier; Beatrice Alex; William Whiteley; Cathie Sudlow; Angus Roberts; Richard J. B. Dobson

doi:10.1038/s41746-022-00730-6

npj Digital Medicine (Dec 2022)

A survey on clinical natural language processing in the United Kingdom from 2007 to 2022

Honghan Wu,
Minhong Wang,
Jinge Wu,
Farah Francis,
Yun-Hsuan Chang,
Alex Shavick,
Hang Dong,
Michael T. C. Poon,
Natalie Fitzpatrick,
Adam P. Levine,
Luke T. Slater,
Alex Handy,
Andreas Karwath,
Georgios V. Gkoutos,
Claude Chelala,
Anoop Dinesh Shah,
Robert Stewart,
Nigel Collier,
Beatrice Alex,
William Whiteley,
Cathie Sudlow,
Angus Roberts,
Richard J. B. Dobson

Affiliations

Honghan Wu: Institute of Health Informatics, University College London
Minhong Wang: Institute of Health Informatics, University College London
Jinge Wu: Institute of Health Informatics, University College London
Farah Francis: Usher Institute, University of Edinburgh
Yun-Hsuan Chang: Institute of Health Informatics, University College London
Alex Shavick: Research Department of Pathology, UCL Cancer Institute, University College London
Hang Dong: Usher Institute, University of Edinburgh
Michael T. C. Poon: Usher Institute, University of Edinburgh
Natalie Fitzpatrick: Institute of Health Informatics, University College London
Adam P. Levine: Research Department of Pathology, UCL Cancer Institute, University College London
Luke T. Slater: Institute of Cancer and Genomics, University of Birmingham
Alex Handy: Institute of Health Informatics, University College London
Andreas Karwath: Institute of Cancer and Genomics, University of Birmingham
Georgios V. Gkoutos: Institute of Cancer and Genomics, University of Birmingham
Claude Chelala: Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London
Anoop Dinesh Shah: Institute of Health Informatics, University College London
Robert Stewart: Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King’s College London
Nigel Collier: Theoretical and Applied Linguistics, Faculty of Modern & Medieval Languages & Linguistics, University of Cambridge
Beatrice Alex: Edinburgh Futures Institute, University of Edinburgh
William Whiteley: Usher Institute, University of Edinburgh
Cathie Sudlow: Usher Institute, University of Edinburgh
Angus Roberts: Department of Biostatistics & Health Informatics, King’s College London
Richard J. B. Dobson: Institute of Health Informatics, University College London

DOI: https://doi.org/10.1038/s41746-022-00730-6
Journal volume & issue: Vol. 5, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Much of the knowledge and information needed for enabling high-quality clinical research is stored in free-text format. Natural language processing (NLP) has been used to extract information from these sources at scale for several decades. This paper aims to present a comprehensive review of clinical NLP for the past 15 years in the UK to identify the community, depict its evolution, analyse methodologies and applications, and identify the main barriers. We collect a dataset of clinical NLP projects (n = 94; £ = 41.97 m) funded by UK funders or the European Union’s funding programmes. Additionally, we extract details on 9 funders, 137 organisations, 139 persons and 431 research papers. Networks are created from timestamped data interlinking all entities, and network analysis is subsequently applied to generate insights. 431 publications are identified as part of a literature review, of which 107 are eligible for final analysis. Results show, not surprisingly, clinical NLP in the UK has increased substantially in the last 15 years: the total budget in the period of 2019–2022 was 80 times that of 2007–2010. However, the effort is required to deepen areas such as disease (sub-)phenotyping and broaden application domains. There is also a need to improve links between academia and industry and enable deployments in real-world settings for the realisation of clinical NLP’s great potential in care delivery. The major barriers include research and development access to hospital data, lack of capable computational resources in the right places, the scarcity of labelled data and barriers to sharing of pretrained models.

Published in npj Digital Medicine

ISSN: 2398-6352 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.nature.com/npjdigitalmed/

About the journal