Engineering Proceedings (Jan 2024)
Quantitative Comparison of Machine Learning Clustering Methods for Tuberculosis Data Analysis
Abstract
In many fields, data-driven decision making has become essential due to machine learning (ML), which provides insights that improve productivity and quality of life. A basic machine learning approach called clustering helps find comparable data points. Clustering plays a critical role in the identification of patient subgroups and the customisation of treatment in the context of tuberculosis (TB) research. While prior studies have recognized its utility, a comprehensive comparative analysis of multiple clustering methods applied to TB data is lacking. Using TB data, this study thoroughly assesses and contrasts four well-known machine learning clustering algorithms: spectral clustering, DBSCAN, hierarchical clustering, and k-means. To evaluate the quality of a cluster, quantitative measures such as the silhouette score, Davies–Bouldin index, and Calinski–Harabasz index are utilised. The results provide quantitative insights that enhance comprehension of clustering and guide future research.
Keywords