Leukemia Research Reports (Jan 2024)
MACHINE-LEARNING-BASED PREDICTIVE CLASSIFIER FOR BONE MARROW FAILURE SYNDROME USING COMPLETE BLOOD COUNT AND CELL POPULATION DATA
Abstract
Introduction: Accurate risk assessment of bone marrow failure syndrome (BMFS) is crucial for early diagnosis and intervention. Methods: We used complete blood count (CBC) data to develop a predictive model for BMFS. Retrospective CBC data were collected from Seoul National University Hospital and Seoul St. Mary's Hospital of the Catholic Medical Center in South Korea. We developed binary classifiers for aplastic anaemia (AA) and myelodysplastic syndrome (MDS) and generated a BMFS classifier to determine the maximum probability. Classifiers were developed using multiple feature sets consisting of 13, 17, 25, or 28 CBC features to ensure applicability to various CBC testing settings. Performance was evaluated using the area under the receiver operating characteristic curve (AUROC). Results: XGBoost achieved the best AUROCs, 0·953–0·961 for the AA classifier and 0·910–0·935 for the MDS classifier, across multiple CBC feature sets. The BMFS classifier, combining the AA and MDS classifiers, demonstrated an AUROC of 0·915–0·936. When using cut-off probabilities to achieve a 95% sensitivity, the specificities ranged from 68% to 79%. External validation on an independent dataset yielded an AUROC of 0·932–0·942, a sensitivity of 93–96%, and a specificity of 65–82% at the aforementioned cut-offs. Conclusions: Our predictive model provides a practical guide for diagnosing BMFS based on basic demographics and CBC data available during the first clinical encounter. It provides a reliable risk assessment tool for primary physicians, facilitating a more effective triage, timely referrals, and improved patient care.