Leveraging immuno-fluorescence data to reduce pathologist annotation requirements in lung tumor segmentation using deep learning

Hatef Mehrabian; Jens Brodbeck; Peipei Lyu; Edith Vaquero; Abhishek Aggarwal; Lauri Diehl

doi:10.1038/s41598-024-69244-3

Scientific Reports (Sep 2024)

Leveraging immuno-fluorescence data to reduce pathologist annotation requirements in lung tumor segmentation using deep learning

Hatef Mehrabian,
Jens Brodbeck,
Peipei Lyu,
Edith Vaquero,
Abhishek Aggarwal,
Lauri Diehl

Affiliations

Hatef Mehrabian: Non-Clinical Safety and Pathobiology, Gilead Sciences
Jens Brodbeck: Non-Clinical Safety and Pathobiology, Gilead Sciences
Peipei Lyu: Non-Clinical Safety and Pathobiology, Gilead Sciences
Edith Vaquero: Non-Clinical Safety and Pathobiology, Gilead Sciences
Abhishek Aggarwal: Non-Clinical Safety and Pathobiology, Gilead Sciences
Lauri Diehl: Non-Clinical Safety and Pathobiology, Gilead Sciences

DOI: https://doi.org/10.1038/s41598-024-69244-3
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 16

Abstract

Read online

Abstract The main bottleneck in training a robust tumor segmentation algorithm for non-small cell lung cancer (NSCLC) on H&E is generating sufficient ground truth annotations. Various approaches for generating tumor labels to train a tumor segmentation model was explored. A large dataset of low-cost low-accuracy panCK-based annotations was used to pre-train the model and determine the minimum required size of the expensive but highly accurate pathologist annotations dataset. PanCK pre-training was compared to foundation models and various architectures were explored for model backbone. Proper study design and sample procurement for training a generalizable model that captured variations in NSCLC H&E was studied. H&E imaging was performed on 112 samples (three centers, two scanner types, different staining and imaging protocols). Attention U-Net architecture was trained using the large panCK-based annotations dataset (68 samples, total area 10,326 [mm2]) followed by fine-tuning using a small pathologist annotations dataset (80 samples, total area 246 [mm2]). This approach resulted in mean intersection over union (mIoU) of 82% [77 87]. Using panCK pretraining provided better performance compared to foundation models and allowed for 70% reduction in pathologist annotations with no drop in performance. Study design ensured model generalizability over variations on H&E where performance was consistent across centers, scanners, and subtypes.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords