Informatics in Medicine Unlocked (Jan 2020)
Treatment outcome classification of pediatric Acute Lymphoblastic Leukemia patients with clinical and medical data using machine learning: A case study at MAHAK hospital
Abstract
Introduction: Acute Lymphoblastic Leukemia (ALL) is the most common cancer among children. With the advancements of science and technology, the mortality rate of ALL is highly reduced. The aim of this study is treatment outcome classification of ALL patients aged less than 18 years with clinical and medical data using machine learning. For this purpose, ALL pediatric patients younger than 18 years treated at MAHAK multi-super specialty hospital from 2012 to 2018 are analyzed. Furthermore, MAHAK hospital is a reference center for treatment of childhood malignancies in Iran. Data: In this study, data is collected manually from the paper-based records of 241 patients. Features included are patient demographic characteristics, medical information and treatment-related complications. Method: Two scenarios are designed for data analytical purposes in this study. The first one considers all pediatric ALL patients but the second scenario excludes the patients with unknown cause of death from the study. As a whole, common classification algorithms are employed and tuned properly and compared to find the model showing superior performance. Results: Our experimental results show that the XGBoost algorithm outperforms the compared classifiers with an accuracy of 88.5% (95% CI: 82.3–94.0) in the first designed scenario. On the other hand, the superior model in the second scenario is SVM with an accuracy of 94.90% (95% CI: 88.49–98.32) accuracy. Conclusion: Despite several previous works that have analyzed gene expression data for ALL patients, the experimental results in this study show that clinical and medical data has reasonable importance in this area of research, too. Results show a significant improvement in the treatment outcome prediction utilizing the SVM algorithm. Moreover, our findings illustrate that the frequency of fever for a patient is the most predictive factor of the ALL treatment outcome.