Machine learning can aid in prediction of IDH mutation from H&E-stained histology slides in infiltrating gliomas

Benjamin Liechty; Zhuoran Xu; Zhilu Zhang; Cheyanne Slocum; Cagla D. Bahadir; Mert R. Sabuncu; David J. Pisapia

doi:10.1038/s41598-022-26170-6

Scientific Reports (Dec 2022)

Machine learning can aid in prediction of IDH mutation from H&E-stained histology slides in infiltrating gliomas

Benjamin Liechty,
Zhuoran Xu,
Zhilu Zhang,
Cheyanne Slocum,
Cagla D. Bahadir,
Mert R. Sabuncu,
David J. Pisapia

Affiliations

Benjamin Liechty: Department of Pathology and Laboratory Medicine, Weill Cornell Medicine
Zhuoran Xu: Department of Pathology and Laboratory Medicine, Weill Cornell Medicine
Zhilu Zhang: School of Electrical and Computer Engineering, Cornell University and Cornell Tech
Cheyanne Slocum: School of Medicine, Weill Cornell Medicine
Cagla D. Bahadir: Meinig School of Biomedical Engineering, Cornell University
Mert R. Sabuncu: School of Electrical and Computer Engineering, Cornell University and Cornell Tech
David J. Pisapia: Department of Pathology and Laboratory Medicine, Weill Cornell Medicine

DOI: https://doi.org/10.1038/s41598-022-26170-6
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 12

Abstract

Read online

Abstract While Machine Learning (ML) models have been increasingly applied to a range of histopathology tasks, there has been little emphasis on characterizing these models and contrasting them with human experts. We present a detailed empirical analysis comparing expert neuropathologists and ML models at predicting IDH mutation status in H&E-stained histology slides of infiltrating gliomas, both independently and synergistically. We find that errors made by neuropathologists and ML models trained using the TCGA dataset are distinct, representing modest agreement between predictions (human-vs.-human κ = 0.656; human-vs.-ML model κ = 0.598). While no ML model surpassed human performance on an independent institutional test dataset (human AUC = 0.901, max ML AUC = 0.881), a hybrid model aggregating human and ML predictions demonstrates predictive performance comparable to the consensus of two expert neuropathologists (hybrid classifier AUC = 0.921 vs. two-neuropathologist consensus AUC = 0.920). We also show that models trained at different levels of magnification exhibit different types of errors, supporting the value of aggregation across spatial scales in the ML approach. Finally, we present a detailed interpretation of our multi-scale ML ensemble model which reveals that predictions are driven by human-identifiable features at the patch-level.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal