BMC Cancer (Sep 2022)
Machine learning-based improvement of MDS-CBC score brings platelets into the limelight to optimize smear review in the hematology laboratory
Abstract
Abstract Background Myelodysplastic syndromes (MDS) are clonal hematopoietic diseases of the elderly characterized by chronic cytopenias, ineffective and dysplastic haematopoiesis, recurrent genetic abnormalities and increased risk of progression to acute myeloid leukemia. A challenge of routine laboratory Complete Blood Counts (CBC) is to correctly identify MDS patients while simultaneously avoiding excess smear reviews. To optimize smear review, the latest generations of hematology analyzers provide new cell population data (CPD) parameters with an increased ability to screen MDS, among which the previously described MDS-CBC Score, based on Absolute Neutrophil Count (ANC), structural neutrophil dispersion (Ne-WX) and mean corpuscular volume (MCV). Ne-WX is increased in the presence of hypogranulated/degranulated neutrophils, a hallmark of dysplasia in the context of MDS or chronic myelomonocytic leukemia. Ne-WX and MCV are CPD derived from leukocytes and red blood cells, therefore the MDS-CBC score does not include any platelet-derived CPD. We asked whether this score could be improved by adding the immature platelet fraction (IPF), a CPD used as a surrogate marker of dysplastic thrombopoiesis. Methods Here, we studied a cohort of more than 500 individuals with cytopenias, including 168 MDS patients. In a first step, we used Breiman’s random forests algorithm, a machine-learning approach, to identify the most relevant parameters for MDS prediction. We then designed Classification And Regression Trees (CART) to evaluate, using resampling, the effect of model tuning parameters on performance and choose the “optimal” model across these parameters. Results Using random forests algorithm, we identified Ne-WX and IPF as the strongest discriminatory predictors, explaining 37 and 33% of diagnoses respectively. To obtain “simplified” trees, which could be easily implemented into laboratory middlewares, we designed CART combining MDS-CBC score and IPF. Optimal results were obtained using a MDS-CBC score threshold equal to 0.23, and an IPF threshold equal to 3%. Conclusions We propose an extended MDS-CBC score, including CPD from the three myeloid lineages, to improve MDS diagnosis on routine laboratory CBCs and optimize smear reviews.
Keywords