Open Engineering (May 2021)
Utilization of K-nearest neighbor algorithm for classification of white blood cells in AML M4, M5, and M7
Abstract
Acute myeloid leukemia (AML) M4, M5, and M7 are subtypes of leukemia derived from myeloid cell derivatives that influences the results of the identification of AMLs, which includes myeloblast, monoblast, and megakaryoblast. Furthermore, they are divided into more specific types, including myeloblasts, promyelocytes, monoblasts, promonocytes, monocytes, and megakaryoblasts, which must be clearly identified in order to further calculate the ratio value in the blood. Therefore, this research aims to classify these cell types using the K-nearest neighbor (KNN) algorithm. Three distance metrics are tested, namely, Euclidean, Chebychev, and Minkowski, and both the weighted and unweighted were tested. The features used as parameters are area, nucleus ratio, circularity, perimeter, mean, and standard deviation, and about 1,450 objects are used as training and testing data. In addition, to ensure that the classification is not overfitting, K-fold cross validation was conducted. The results show that the unweighted Minkowski distance acquired about 240 of 290 objects at K = 19, which is the best. Therefore, the unweighted Minkowski distance is selected for further analysis. The accuracy, recall, and precision values of KNN with unweighted Minkowski distance obtained from fivefold cross validation are 80.552, 44.145, and 42.592%, respectively.
Keywords