The Lancet: Digital Health (Jan 2020)

Development and validation of a risk prediction model to diagnose Barrett's oesophagus (MARK-BE): a case-control machine learning approach

  • Avi Rosenfeld, PhD,
  • David G Graham, MBBS,
  • Sarah Jevons, PhD,
  • Jose Ariza, RGN,
  • Daryl Hagan, MSc,
  • Ash Wilson, BSc,
  • Samuel J Lovat,
  • Sarmed S Sami, MBBS,
  • Omer F Ahmad, MBBS,
  • Marco Novelli, ProfMBChB,
  • Manuel Rodriguez Justo, MBBS,
  • Alison Winstanley, MBBS,
  • Eliyahu M Heifetz, PhD,
  • Mordehy Ben-Zecharia, PhD,
  • Uria Noiman, PhD,
  • Rebecca C Fitzgerald, ProfMBChB,
  • Peter Sasieni, ProfPhD,
  • Laurence B Lovat, ProfMBBS,
  • Karen Coker,
  • Wanfeng Zhao,
  • Kathryn Brown,
  • Beverley Haynes,
  • Tara Nuckcheddy Grant,
  • Massimiliano di Pietro,
  • Eleanor Dewhurst,
  • Bincy Alias,
  • Leanne Mills,
  • Caroline Wilson,
  • Elizabeth Bird-Lieberman,
  • Jan Bornschein,
  • Yean Lim,
  • Kareem Shariff,
  • Roberto Cayado Lopez,
  • Myrna Udarbe,
  • Claire Shaw,
  • Glynis Rose,
  • Ian Sargeant,
  • M Al-Izzi,
  • Roisin Schimmel,
  • Elizabeth Green,
  • Morgan Moorghen,
  • Reshma Kanani,
  • Mariann Baulf,
  • Jayne Butcher,
  • Adil Butt,
  • Steve Bown,
  • Gideon Lipman,
  • Rami Sweis,
  • Vinay Sehgal,
  • Matthew Banks,
  • Rehan Haidry,
  • John Louis-Auguste,
  • Darina Kohoutova,
  • Sarah Kerr,
  • Victor Eneh,
  • Nigel Butter,
  • Haroon Miah,
  • Rommel Butawan,
  • Grace Adesina,
  • Sabrina Holohan,
  • Joan Idris,
  • Nick Hayes,
  • Shajahan Wahed,
  • Nelson Kath Houghton,
  • Marc Hopton,
  • Anne Eastick,
  • Debasis Majumdar,
  • Kassem Manuf,
  • Lyndsey Fieldson,
  • Helen Bailey,
  • Jacobo Fernandez-Sordo Ortiz,
  • Mina Patel,
  • Suzanne Henry,
  • Samantha Warburton,
  • Jonathan White,
  • Lisa Gadeke,
  • Beverley Longhurst,
  • Richmond Abeseabe,
  • Peter Basford,
  • Rupam Bhattacharyya,
  • Scott Elliot,
  • Roisin Bevan,
  • Carly Brown,
  • Philippa Laverick,
  • Gayle Clifford,
  • Anita Gibbons,
  • Julie Ingmire,
  • Abdullah Mawas,
  • Jacquelyn Harvey,
  • Sharon Cave

Journal volume & issue
Vol. 2, no. 1
pp. e37 – e48

Abstract

Read online

Summary: Background: Screening for Barrett's oesophagus relies on endoscopy, which is invasive and few who undergo the procedure are found to have the condition. We aimed to use machine learning techniques to develop and externally validate a simple risk prediction panel to screen individuals for Barrett's oesophagus. Methods: In this prospective study, machine learning risk prediction in Barrett's oesophagus (MARK-BE), we used data from two case-control studies, BEST2 and BOOST, to compile training and validation datasets. From the BEST2 study, we analysed questionnaires from 1299 patients, of whom 880 (67·7%) had Barrett's oesophagus, including 40 with invasive oesophageal adenocarcinoma, and 419 (32·3%) were controls. We randomly split (6:4) the cohort using a computer algorithm into a training dataset of 776 patients and a testing dataset of 523 patients. We compiled an external validation cohort from the BOOST study, which included 398 patients, comprising 198 patients with Barrett's oesophagus (23 with oesophageal adenocarcinoma) and 200 controls. We identified independently important diagnostic features of Barrett's oesophagus using the machine learning techniques information gain and correlation-based feature selection. We assessed multiple classification tools to create a multivariable risk prediction model. Internal validation of the model using the BEST2 testing dataset was followed by external validation using the BOOST external validation dataset. From these data we created a prediction panel to identify at-risk individuals. Findings: The BEST2 study included 40 diagnostic features. Of these, 19 added information gain but after correlation-based feature selection only eight showed independent diagnostic value including age, sex, cigarette smoking, waist circumference, frequency of stomach pain, duration of heartburn and acidic taste, and taking antireflux medication, of which all were associated with increased risk of Barrett's oesophagus, except frequency of stomach pain, with was inversely associated in a case-control population. Logistic regression offered the highest prediction quality with an area under the receiver-operator curve (AUC) of 0·87 (95% CI 0·84–0·90; sensitivity set at 90%; specificity of 68%). In the testing dataset, AUC was 0·86 (0·83–0·89; sensitivity set at 90%; specificity of 65%). In the external validation dataset, the AUC was 0·81 (0·74–0·84; sensitivity set at 90%; specificity of 58%). Interpretation: Our diagnostic model offers valid predictions of diagnosis of Barrett's oesophagus in patients with symptomatic gastro-oesophageal reflux disease, assisting in identifying who should go forward to invasive confirmatory testing. Our predictive panel suggests that overweight men who have been taking antireflux medication for a long time might merit particular consideration for further testing. Our risk prediction panel is quick and simple to administer but will need further calibration and validation in a prospective study in primary care. Funding: Charles Wolfson Charitable Trust and Guts UK.