The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization

Ida Bagus Gede Sarasvananda; Retantyo Wardoyo; Anny Kartika Sari

doi:10.22146/ijccs.45093

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) (Oct 2019)

The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization

Ida Bagus Gede Sarasvananda,
Retantyo Wardoyo,
Anny Kartika Sari

Affiliations

Ida Bagus Gede Sarasvananda: Master Program of Computer Science, FMIPA UGM, Yogyakarta
Retantyo Wardoyo: Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
Anny Kartika Sari: Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta

DOI: https://doi.org/10.22146/ijccs.45093
Journal volume & issue: Vol. 13, no. 4
pp. 313 – 322

Abstract

Read online

The cost of hospitalization from a patient can be estimated by performing a cluster of patient. One of the algorithms that is widely used for clustering is K-means. K-means algorithm, based on distance still has weaknesses in terms of measuring the proximity of meaning or semantics between data. To overcome this problem, semantic similarity can be used to measure the similarity between objects in clustering, so that, semantic proximity can be calculated. This study aims to conduct clustering of patient data by paying attention to the similarity of the patient’s disease. ICD code is used as a guide in determining a patient’s disease. The K-means method is combined with semantic similarity to measure the proximity of the patient’s ICD code. The method used to measure the semantic similarity between data, in this study, is the semantic similarity of Girardi, Leacock & Chodorow, Rada, and Jaccard Similarity. Cluster quality measurement uses the silhouette coefficient method. Based on the experimental results, the method of measuring semantic similarity data is capable to produce better quality clustering results than without semantic similarity. The best accuracy is 91.78% for the three semantic similarity methods, whereas without semantic similarity the best accuracy is 84.93%.

Published in IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

ISSN: 1978-1520 (Print); 2460-7258 (Online)
Publisher: Universitas Gadjah Mada
Country of publisher: Indonesia
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://jurnal.ugm.ac.id/ijccs

About the journal

Abstract

Keywords