Using Explainable Machine Learning to Explore the Impact of Synoptic Reporting on Prostate Cancer

Femke M. Janssen; Katja K. H. Aben; Berdine L. Heesterman; Quirinus J. M. Voorham; Paul A. Seegers; Arturo Moncada-Torres

doi:10.3390/a15020049

Algorithms (Jan 2022)

Using Explainable Machine Learning to Explore the Impact of Synoptic Reporting on Prostate Cancer

Femke M. Janssen,
Katja K. H. Aben,
Berdine L. Heesterman,
Quirinus J. M. Voorham,
Paul A. Seegers,
Arturo Moncada-Torres

Affiliations

Femke M. Janssen: The Netherlands Comprehensive Cancer Organization (IKNL), 5612 HZ Eindhoven, The Netherlands
Katja K. H. Aben: The Netherlands Comprehensive Cancer Organization (IKNL), 5612 HZ Eindhoven, The Netherlands
Berdine L. Heesterman: The Netherlands Comprehensive Cancer Organization (IKNL), 5612 HZ Eindhoven, The Netherlands
Quirinus J. M. Voorham: Nationwide Network and Registry of Histo- and Cytopathology in The Netherlands (PALGA), 1066 CX Amsterdam, The Netherlands
Paul A. Seegers: Nationwide Network and Registry of Histo- and Cytopathology in The Netherlands (PALGA), 1066 CX Amsterdam, The Netherlands
Arturo Moncada-Torres: The Netherlands Comprehensive Cancer Organization (IKNL), 5612 HZ Eindhoven, The Netherlands

DOI: https://doi.org/10.3390/a15020049
Journal volume & issue: Vol. 15, no. 2
p. 49

Abstract

Read online

Machine learning (ML) models have proven to be an attractive alternative to traditional statistical methods in oncology. However, they are often regarded as black boxes, hindering their adoption for answering real-life clinical questions. In this paper, we show a practical application of explainable machine learning (XML). Specifically, we explored the effect that synoptic reporting (SR; i.e., reports where data elements are presented as discrete data items) in Pathology has on the survival of a population of 14,878 Dutch prostate cancer patients. We compared the performance of a Cox Proportional Hazards model (CPH) against that of an eXtreme Gradient Boosting model (XGB) in predicting patient ranked survival. We found that the XGB model (c-index = 0.67) performed significantly better than the CPH (c-index = 0.58). Moreover, we used Shapley Additive Explanations (SHAP) values to generate a quantitative mathematical representation of how features—including usage of SR—contributed to the models’ output. The XGB model in combination with SHAP visualizations revealed interesting interaction effects between SR and the rest of the most important features. These results hint that SR has a moderate positive impact on predicted patient survival. Moreover, adding an explainability layer to predictive ML models can open their black box, making them more accessible and easier to understand by the user. This can make XML-based techniques appealing alternatives to the classical methods used in oncological research and in health care in general.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords