A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis

Xiaoxuan Liu, MBChB; Livia Faes, MD; Aditya U Kale, MBChB; Siegfried K Wagner, BMBCh; Dun Jack Fu, PhD; Alice Bruynseels, MBChB; Thushika Mahendiran, MBChB; Gabriella Moraes, MD; Mohith Shamdas, MBBS; Christoph Kern, MD; Joseph R Ledsam, MBChB; Martin K Schmid, MD; Konstantinos Balaskas, MD; Eric J Topol, MD; Lucas M Bachmann, ProfPhD; Pearse A Keane, MD; Alastair K Denniston, ProfPhD

The Lancet: Digital Health (Oct 2019)

A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis

Xiaoxuan Liu, MBChB,
Livia Faes, MD,
Aditya U Kale, MBChB,
Siegfried K Wagner, BMBCh,
Dun Jack Fu, PhD,
Alice Bruynseels, MBChB,
Thushika Mahendiran, MBChB,
Gabriella Moraes, MD,
Mohith Shamdas, MBBS,
Christoph Kern, MD,
Joseph R Ledsam, MBChB,
Martin K Schmid, MD,
Konstantinos Balaskas, MD,
Eric J Topol, MD,
Lucas M Bachmann, ProfPhD,
Pearse A Keane, MD,
Alastair K Denniston, ProfPhD

Affiliations

Xiaoxuan Liu, MBChB: Department of Ophthalmology, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Academic Unit of Ophthalmology, Institute of Inflammation & Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK; Medical Retina Department, Moorfields Eye Hospital NHS Foundation Trust, London, UK; Health Data Research UK, London, UK
Livia Faes, MD: Medical Retina Department, Moorfields Eye Hospital NHS Foundation Trust, London, UK; Eye Clinic, Cantonal Hospital of Lucerne, Lucerne, Switzerland
Aditya U Kale, MBChB: Department of Ophthalmology, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
Siegfried K Wagner, BMBCh: NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK
Dun Jack Fu, PhD: Medical Retina Department, Moorfields Eye Hospital NHS Foundation Trust, London, UK
Alice Bruynseels, MBChB: Department of Ophthalmology, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
Thushika Mahendiran, MBChB: Department of Ophthalmology, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
Gabriella Moraes, MD: Medical Retina Department, Moorfields Eye Hospital NHS Foundation Trust, London, UK
Mohith Shamdas, MBBS: Academic Unit of Ophthalmology, Institute of Inflammation & Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
Christoph Kern, MD: Medical Retina Department, Moorfields Eye Hospital NHS Foundation Trust, London, UK; University Eye Hospital, Ludwig Maximilian University of Munich, Munich, Germany
Joseph R Ledsam, MBChB: DeepMind, London, UK
Martin K Schmid, MD: Eye Clinic, Cantonal Hospital of Lucerne, Lucerne, Switzerland
Konstantinos Balaskas, MD: Medical Retina Department, Moorfields Eye Hospital NHS Foundation Trust, London, UK; NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK
Eric J Topol, MD: Scripps Research Translational Institute, La Jolla, California
Lucas M Bachmann, ProfPhD: Medignition, Research Consultants, Zurich, Switzerland
Pearse A Keane, MD: NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK; Health Data Research UK, London, UK
Alastair K Denniston, ProfPhD: Department of Ophthalmology, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Academic Unit of Ophthalmology, Institute of Inflammation & Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK; Centre for Patient Reported Outcome Research, Institute of Applied Health Research, University of Birmingham, Birmingham, UK; NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK; Health Data Research UK, London, UK; Correspondence to: Prof Alastair Denniston, University Hospitals Birmingham NHS Foundation Trust, University of Birmingham, Birmingham B15 2TH, UK

Journal volume & issue: Vol. 1, no. 6
pp. e271 – e297

Abstract

Read online

Summary: Background: Deep learning offers considerable promise for medical diagnostics. We aimed to evaluate the diagnostic accuracy of deep learning algorithms versus health-care professionals in classifying diseases using medical imaging. Methods: In this systematic review and meta-analysis, we searched Ovid-MEDLINE, Embase, Science Citation Index, and Conference Proceedings Citation Index for studies published from Jan 1, 2012, to June 6, 2019. Studies comparing the diagnostic performance of deep learning models and health-care professionals based on medical imaging, for any disease, were included. We excluded studies that used medical waveform data graphics material or investigated the accuracy of image segmentation rather than disease classification. We extracted binary diagnostic accuracy data and constructed contingency tables to derive the outcomes of interest: sensitivity and specificity. Studies undertaking an out-of-sample external validation were included in a meta-analysis, using a unified hierarchical model. This study is registered with PROSPERO, CRD42018091176. Findings: Our search identified 31 587 studies, of which 82 (describing 147 patient cohorts) were included. 69 studies provided enough data to construct contingency tables, enabling calculation of test accuracy, with sensitivity ranging from 9·7% to 100·0% (mean 79·1%, SD 0·2) and specificity ranging from 38·9% to 100·0% (mean 88·3%, SD 0·1). An out-of-sample external validation was done in 25 studies, of which 14 made the comparison between deep learning models and health-care professionals in the same sample. Comparison of the performance between health-care professionals in these 14 studies, when restricting the analysis to the contingency table for each study reporting the highest accuracy, found a pooled sensitivity of 87·0% (95% CI 83·0–90·2) for deep learning models and 86·4% (79·9–91·0) for health-care professionals, and a pooled specificity of 92·5% (95% CI 85·1–96·4) for deep learning models and 90·5% (80·6–95·7) for health-care professionals. Interpretation: Our review found the diagnostic performance of deep learning models to be equivalent to that of health-care professionals. However, a major finding of the review is that few studies presented externally validated results or compared the performance of deep learning models and health-care professionals using the same sample. Additionally, poor reporting is prevalent in deep learning studies, which limits reliable interpretation of the reported diagnostic accuracy. New reporting standards that address specific challenges of deep learning could improve future studies, enabling greater confidence in the results of future evaluations of this promising technology. Funding: None.

Published in The Lancet: Digital Health

ISSN: 2589-7500 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.thelancet.com/journals/landig/home

About the journal