The Lancet: Digital Health (Oct 2023)

Comparison of humans versus mobile phone-powered artificial intelligence for the diagnosis and management of pigmented skin cancer in secondary care: a multicentre, prospective, diagnostic, clinical trial

  • Scott W Menzies, ProfPhD,
  • Christoph Sinz, MD,
  • Michelle Menzies, BSc,
  • Serigne N Lo, PhD,
  • William Yolland, BSc,
  • Johann Lingohr, BSc,
  • Majid Razmara, PhD,
  • Philipp Tschandl, PhD,
  • Pascale Guitera, ProfPhD,
  • Richard A Scolyer, ProfMD,
  • Florentina Boltz, MD,
  • Liliane Borik-Heil, MD,
  • Hsien Herbert Chan, MD,
  • David Chromy, MD,
  • David J Coker, MD,
  • Helena Collgros, MD,
  • Maryam Eghtedari, MD,
  • Marina Corral Forteza, MD,
  • Emily Forward, MD,
  • Bruna Gallo, MD,
  • Stephanie Geisler, MD,
  • Matthew Gibson, MMed,
  • Amelie Hampel, MD,
  • Genevieve Ho, MD,
  • Laura Junez, MD,
  • Philipp Kienzl, PhD,
  • Arthur Martin, MD,
  • Fergal J Moloney, MD,
  • Amanda Regio Pereira, MD,
  • Julia Maria Ressler, MD,
  • Susanne Richter, MD,
  • Katharina Silic, MD,
  • Thomas Silly, MD,
  • Michael Skoll, MD,
  • Julia Tittes, MD,
  • Philipp Weber, MD,
  • Wolfgang Weninger, ProfPhD,
  • Doris Weiss, MD,
  • Ping Woo-Sampson, MD,
  • Catherine Zilberg, MD,
  • Harald Kittler, MD

Journal volume & issue
Vol. 5, no. 10
pp. e679 – e691

Abstract

Read online

Summary: Background: Diagnosis of skin cancer requires medical expertise, which is scarce. Mobile phone-powered artificial intelligence (AI) could aid diagnosis, but it is unclear how this technology performs in a clinical scenario. Our primary aim was to test in the clinic whether there was equivalence between AI algorithms and clinicians for the diagnosis and management of pigmented skin lesions. Methods: In this multicentre, prospective, diagnostic, clinical trial, we included specialist and novice clinicians and patients from two tertiary referral centres in Australia and Austria. Specialists had a specialist medical qualification related to diagnosing and managing pigmented skin lesions, whereas novices were dermatology junior doctors or registrars in trainee positions who had experience in examining and managing these lesions. Eligible patients were aged 18–99 years and had a modified Fitzpatrick I–III skin type; those in the diagnostic trial were undergoing routine excision or biopsy of one or more suspicious pigmented skin lesions bigger than 3 mm in the longest diameter, and those in the management trial had baseline total-body photographs taken within 1–4 years. We used two mobile phone-powered AI instruments incorporating a simple optical attachment: a new 7-class AI algorithm and the International Skin Imaging Collaboration (ISIC) AI algorithm, which was previously tested in a large online reader study. The reference standard for excised lesions in the diagnostic trial was histopathological examination; in the management trial, the reference standard was a descending hierarchy based on histopathological examination, comparison of baseline total-body photographs, digital monitoring, and telediagnosis. The main outcome of this study was to compare the accuracy of expert and novice diagnostic and management decisions with the two AI instruments. Possible decisions in the management trial were dismissal, biopsy, or 3-month monitoring. Decisions to monitor were considered equivalent to dismissal (scenario A) or biopsy of malignant lesions (scenario B). The trial was registered at the Australian New Zealand Clinical Trials Registry ACTRN12620000695909 (Universal trial number U1111–1251–8995). Findings: The diagnostic study included 172 suspicious pigmented lesions (84 malignant) from 124 patients and the management study included 5696 pigmented lesions (18 malignant) from the whole body of 66 high-risk patients. The diagnoses of the 7-class AI algorithm were equivalent to the specialists’ diagnoses (absolute accuracy difference 1·2% [95% CI –6·9 to 9·2]) and significantly superior to the novices’ ones (21·5% [13·1 to 30·0]). The diagnoses of the ISIC AI algorithm were significantly inferior to the specialists’ diagnoses (–11·6% [–20·3 to –3·0]) but significantly superior to the novices’ ones (8·7% [–0·5 to 18·0]). The best 7-class management AI was significantly inferior to specialists’ management (absolute accuracy difference in correct management decision –0·5% [95% CI –0·7 to –0·2] in scenario A and –0·4% [–0·8 to –0·05] in scenario B). Compared with the novices’ management, the 7-class management AI was significantly inferior (–0·4% [–0·6 to –0·2]) in scenario A but significantly superior (0·4% [0·0 to 0·9]) in scenario B. Interpretation: The mobile phone-powered AI technology is simple, practical, and accurate for the diagnosis of suspicious pigmented skin cancer in patients presenting to a specialist setting, although its usage for management decisions requires more careful execution. An AI algorithm that was superior in experimental studies was significantly inferior to specialists in a real-world scenario, suggesting that caution is needed when extrapolating results of experimental studies to clinical practice. Funding: MetaOptima Technology.