PLoS ONE (Jan 2024)
An interpretable framework for inter-observer agreement measurements in TILs scoring on histopathological breast images: A proof-of-principle study.
Abstract
Breast cancer, a widespread and life-threatening disease, necessitates precise diagnostic tools for improved patient outcomes. Tumor-Infiltrating Lymphocytes (TILs), reflective of the immune response against cancer cells, are pivotal in understanding breast cancer behavior. However, inter-observer variability in TILs scoring methods poses challenges to reliable assessments. This study introduces a novel and interpretable proof-of-principle framework comprising two innovative inter-observer agreement measures. The first method, Boundary-Weighted Fleiss' Kappa (BWFK), addresses tissue segmentation predictions, focusing on mitigating disagreements along tissue boundaries. BWFK enhances the accuracy of stromal segmentation, providing a nuanced assessment of inter-observer agreement. The second proposed method, the Distance Based Cell Agreement Algorithm (DBCAA), eliminates the need for ground truth annotations in cell detection predictions. This innovative approach offers versatility across histopathological analyses, overcoming data availability challenges. Both methods were applied to assess inter-observer agreement using a clinical image dataset consisting of 25 images of invasive ductal breast carcinoma tissue, each annotated by four pathologists, serving as a proof-of-principle. Experimental investigations demonstrated that the BWFK method yielded gains of up to 32% compared to the standard Fleiss' Kappa model. Furthermore, a procedure for conducting clinical validations of artificial intelligence (AI) based cell detection methods was elucidated. Thoroughly validated on a clinical dataset, the framework contributes to standardized, reliable, and interpretable inter-observer agreement assessments. This study is the first examination of inter-observer agreements in stromal segmentation and lymphocyte detection for the TILs scoring problem. The study emphasizes the potential impact of these measures in advancing histopathological image analysis, fostering consensus in TILs scoring, and ultimately improving breast cancer diagnostics and treatment planning. The source code and implementation guide for this study are accessible on our GitHub page, and the full clinical dataset is available for academic and research purposes on Kaggle.