Quantitative Comparison of Machine Learning Clustering Methods for Tuberculosis Data Analysis

Marlen Kossakov; Assel Mukasheva; Gani Balbayev; Syrym Seidazimov; Dinargul Mukammejanova; Madina Sydybayeva

doi:10.3390/engproc2024060020

Engineering Proceedings (Jan 2024)

Quantitative Comparison of Machine Learning Clustering Methods for Tuberculosis Data Analysis

Marlen Kossakov,
Assel Mukasheva,
Gani Balbayev,
Syrym Seidazimov,
Dinargul Mukammejanova,
Madina Sydybayeva

Affiliations

Marlen Kossakov: Department of Information Technology, Non-Profit JSC “Almaty University of Power Engineering and Telecommunications Named after Gumarbek Daukeyev”, 050013 Almaty, Kazakhstan
Assel Mukasheva: School of Information Technology and Engineering, Kazakh-British Technical University, 050000 Almaty, Kazakhstan
Gani Balbayev: Academy of Logistics and Transport, 050012 Almaty, Kazakhstan
Syrym Seidazimov: Department of Information Technology, Non-Profit JSC “Almaty University of Power Engineering and Telecommunications Named after Gumarbek Daukeyev”, 050013 Almaty, Kazakhstan
Dinargul Mukammejanova: Faculty of Computer Technologies and Cyber Security, International University of Information Technology, 050000 Almaty, Kazakhstan
Madina Sydybayeva: Faculty of Computer Technologies and Cyber Security, International University of Information Technology, 050000 Almaty, Kazakhstan

DOI: https://doi.org/10.3390/engproc2024060020
Journal volume & issue: Vol. 60, no. 1
p. 20

Abstract

Read online

In many fields, data-driven decision making has become essential due to machine learning (ML), which provides insights that improve productivity and quality of life. A basic machine learning approach called clustering helps find comparable data points. Clustering plays a critical role in the identification of patient subgroups and the customisation of treatment in the context of tuberculosis (TB) research. While prior studies have recognized its utility, a comprehensive comparative analysis of multiple clustering methods applied to TB data is lacking. Using TB data, this study thoroughly assesses and contrasts four well-known machine learning clustering algorithms: spectral clustering, DBSCAN, hierarchical clustering, and k-means. To evaluate the quality of a cluster, quantitative measures such as the silhouette score, Davies–Bouldin index, and Calinski–Harabasz index are utilised. The results provide quantitative insights that enhance comprehension of clustering and guide future research.

Published in Engineering Proceedings

ISSN: 2673-4591 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General): Engineering machinery, tools, and implements
Website: https://www.mdpi.com/journal/engproc

About the journal

Abstract

Keywords