Gastroenterology Research and Practice (Jan 2018)
Machine Learning Creates a Simple Endoscopic Classification System that Improves Dysplasia Detection in Barrett’s Oesophagus amongst Non-expert Endoscopists
Abstract
Introduction. Barrett’s oesophagus (BE) is a precursor to oesophageal adenocarcinoma (OAC). Endoscopic surveillance is performed to detect dysplasia arising in BE as it is likely to be amenable to curative treatment. At present, there are no guidelines on who should perform surveillance endoscopy in BE. Machine learning (ML) is a branch of artificial intelligence (AI) that generates simple rules, known as decision trees (DTs). We hypothesised that a DT generated from recognised expert endoscopists could be used to improve dysplasia detection in non-expert endoscopists. To our knowledge, ML has never been applied in this manner. Methods. Video recordings were collected from patients with non-dysplastic (ND-BE) and dysplastic Barrett’s oesophagus (D-BE) undergoing high-definition endoscopy with i-Scan enhancement (PENTAX®). A strict protocol was used to record areas of interest after which a corresponding biopsy was taken to confirm the histological diagnosis. In a blinded manner, videos were shown to 3 experts who were asked to interpret them based on their mucosal and microvasculature patterns and presence of nodularity and ulceration as well as overall suspected diagnosis. Data generated were entered into the WEKA package to construct a DT for dysplasia prediction. Non-expert endoscopists (gastroenterology specialist registrars in training with variable experience and undergraduate medical students with no experience) were asked to score these same videos both before and after web-based training using the DT constructed from the expert opinion. Accuracy, sensitivity, and specificity values were calculated before and after training where p<0.05 was statistically significant. Results. Videos from 40 patients were collected including 12 both before and after acetic acid (ACA) application. Experts’ average accuracy for dysplasia prediction was 88%. When experts’ answers were entered into a DT, the resultant decision model had a 92% accuracy with a mean sensitivity and specificity of 97% and 88%, respectively. Addition of ACA did not improve dysplasia detection. Untrained medical students tended to have a high sensitivity but poor specificity as they “overcalled” normal areas. Gastroenterology trainees did the opposite with overall low sensitivity but high specificity. Detection improved significantly and accuracy rose in both groups after formal web-based training although it did it reach the accuracy generated by experts. For trainees, sensitivity rose significantly from 71% to 83% with minimal loss of specificity. Specificity rose sharply in students from 31% to 49% with no loss of sensitivity. Conclusion. ML is able to define rules learnt from expert opinion. These generate a simple algorithm to accurately predict dysplasia. Once taught to non-experts, the algorithm significantly improves their rate of dysplasia detection. This opens the door to standardised training and assessment of competence for those who perform endoscopy in BE. It may shorten the learning curve and might also be used to compare competence of trainees with recognised experts as part of their accreditation process.