Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia

Liyan Pan; Guangjian Liu; Fangqin Lin; Shuling Zhong; Huimin Xia; Xin Sun; Huiying Liang

doi:10.1038/s41598-017-07408-0

Scientific Reports (Aug 2017)

Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia

Liyan Pan,
Guangjian Liu,
Fangqin Lin,
Shuling Zhong,
Huimin Xia,
Xin Sun,
Huiying Liang

Affiliations

Liyan Pan: Institute of Pediatrics, Guangzhou Women and Children’s Medical Center, Guangzhou Medical University
Guangjian Liu: Institute of Pediatrics, Guangzhou Women and Children’s Medical Center, Guangzhou Medical University
Fangqin Lin: Institute of Pediatrics, Guangzhou Women and Children’s Medical Center, Guangzhou Medical University
Shuling Zhong: Department of Hematology and Oncology, Guangzhou Women and Children’s Medical Center, Guangzhou Medical University
Huimin Xia: Department of Pediatric Surgery, Guangzhou Women and Children’s Medical Center, Guangzhou Medical University
Xin Sun: Department of Hematology and Oncology, Guangzhou Women and Children’s Medical Center, Guangzhou Medical University
Huiying Liang: Institute of Pediatrics, Guangzhou Women and Children’s Medical Center, Guangzhou Medical University

DOI: https://doi.org/10.1038/s41598-017-07408-0
Journal volume & issue: Vol. 7, no. 1
pp. 1 – 9

Abstract

Read online

Abstract The prediction of relapse in childhood acute lymphoblastic leukemia (ALL) is a critical factor for successful treatment and follow-up planning. Our goal was to construct an ALL relapse prediction model based on machine learning algorithms. Monte Carlo cross-validation nested by 10-fold cross-validation was used to rank clinical variables on the randomly split training sets of 336 newly diagnosed ALL children, and a forward feature selection algorithm was employed to find the shortest list of most discriminatory variables. To enable an unbiased estimation of the prediction model to new patients, besides the split test sets of 150 patients, we introduced another independent data set of 84 patients to evaluate the model. The Random Forest model with 14 features achieved a cross-validation accuracy of 0.827 ± 0.031 on one set and an accuracy of 0.798 on the other, with the area under the curve of 0.902 ± 0.027 and 0.904, respectively. The model performed well across different risk-level groups, with the best accuracy of 0.829 in the standard-risk group. To our knowledge, this is the first study to use machine learning models to predict childhood ALL relapse based on medical data from Electronic Medical Record, which will further facilitate stratification treatments.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal