Tehran University Medical Journal (May 2016)

Clustering of patients with anemia by data mining approach

  • Khadijeh Dolatshah,
  • Rassoul Noorossana,
  • Kamran Heidari,
  • Parya Soleimani,
  • Roohallah Ghasempour

Journal volume & issue
Vol. 74, no. 2
pp. 107 – 112

Abstract

Read online

Background: Anemia disease is the most common hematological disorder which most often occurs in women. Knowledge discovery from large volumes of data associated with records of the disease can improve medical services quality by data mining The goal of this study was to determining and evaluating the status of anemia using data mining algorithms. Methods: In this applied study, laboratory and clinical data of the patients with anemia were studied in the population of women. The data have been gathered during a year in the laboratory of Imam Hossein and Shohada-ye Haft-e Tir Hospitals which contains 690 records and 15 laboratory and clinical features of anemia. To discover hidden relationships and structures using k-medoids algorithm the patients were clustered. The Silhouette index was used to determine clustering quality. Results: The features of red blood cell (RBC), mean corpuscular hemoglobin (MCH), ferritin, gastrointestinal cancer (GI cancer), gastrointestinal surgery (GI surgery) and gastrointestinal infection (GI infection) by clustering have been determined as the most important patients’ features. These patients according to their features have been seg-mented to three clusters. First, the patients were clustered according to all features. The results showed that clustering with all features is not suitable because of weak structure of clustering. Then, each time the clustering was performed with different number of features. The silhouette index average is 80 percent that shows clustering quality. Therefore clustering is acceptable and has a strong structure. Conclusion: The results showed that clustering with all features is not suitable because of weak structure. Then, each time the clustering was performed with different number of features. The first cluster contains mild iron deficiency anemia, the second cluster contains severe iron deficiency anemia patients and the third cluster contains patients with other anemia cause.

Keywords