Evaluation of race/ethnicity-specific survival machine learning models for Hispanic and Black patients with breast cancer

Selen Bozkurt; Sunmin Lee; Jung In Park; Jong Won Park

doi:10.1136/bmjhci-2022-100666

BMJ Health & Care Informatics (Jun 2023)

Evaluation of race/ethnicity-specific survival machine learning models for Hispanic and Black patients with breast cancer

Selen Bozkurt,
Sunmin Lee,
Jung In Park,
Jong Won Park

Affiliations

Selen Bozkurt: Stanford University, Stanford, California, USA
Sunmin Lee: School of Medicine, University of California Irvine, Irvine, California, USA
Jung In Park: University of California Irvine, Irvine, California, USA
Jong Won Park: Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, South Korea

DOI: https://doi.org/10.1136/bmjhci-2022-100666
Journal volume & issue: Vol. 30, no. 1

Abstract

Read online

Objectives Survival machine learning (ML) has been suggested as a useful approach for forecasting future events, but a growing concern exists that ML models have the potential to cause racial disparities through the data used to train them. This study aims to develop race/ethnicity-specific survival ML models for Hispanic and black women diagnosed with breast cancer to examine whether race/ethnicity-specific ML models outperform the general models trained with all races/ethnicity data.Methods We used the data from the US National Cancer Institute’s Surveillance, Epidemiology and End Results programme registries. We developed the Hispanic-specific and black-specific models and compared them with the general model using the Cox proportional-hazards model, Gradient Boost Tree, survival tree and survival support vector machine.Results A total of 322 348 female patients who had breast cancer diagnoses between 1 January 2000 and 31 December 2017 were identified. The race/ethnicity-specific models for Hispanic and black women consistently outperformed the general model when predicting the outcomes of specific race/ethnicity.Discussion Accurately predicting the survival outcome of a patient is critical in determining treatment options and providing appropriate cancer care. The high-performing models developed in this study can contribute to providing individualised oncology care and improving the survival outcome of black and Hispanic women.Conclusion Predicting the individualised survival outcome of breast cancer can provide the evidence necessary for determining treatment options and high-quality, patient-centred cancer care delivery for under-represented populations. Also, the race/ethnicity-specific ML models can mitigate representation bias and contribute to addressing health disparities.

Published in BMJ Health & Care Informatics

ISSN: 2632-1009 (Online)
Publisher: BMJ Publishing Group
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://informatics.bmj.com/

About the journal