Journal of Applied Informatics and Computing (Jul 2023)

Analysis of Elbow, Silhouette, Davies-Bouldin, Calinski-Harabasz, and Rand-Index Evaluation on K-Means Algorithm for Classifying Flood-Affected Areas in Jakarta

  • Ilham Firman Ashari,
  • Eko Dwi Nugroho,
  • Randi Baraku,
  • Ilham Novri Yanda,
  • Ridho Liwardana

DOI
https://doi.org/10.30871/jaic.v7i1.4947
Journal volume & issue
Vol. 7, no. 1
pp. 95 – 103

Abstract

Read online

Jakarta is the capital city of Indonesia, which has a high population density, and is an area that is frequently hit by floods. This study aims to determine the classification of flood-affected areas in Jakarta between severe, moderate, and low. Design/method/approach: The study was conducted using the elbow, Silhouette, Davidson-Bouldin, and Calinski-Harabasz methods on the K-means algorithm, as well as the Rand method. index for evaluation. Grouping with 3 and 6 groups is the best grouping value based on Calinski-Harabasz. By using the davies bouldin index from the observations, the K value with a value of 6 has the smallest Davies-Bouldin value with a value of 0.2737. By using sillhoute, the experimental results obtained the best values sequentially, namely K=2, K=3, and K=6 with silhouette values of 0.866, 0.854, and 0.803. In this experiment, based on the elbow method, it was found that the best K value was K=3. This was obtained because it was based on observations on the appearance of the SSE data compared to the value of K. In the graph above, it can be seen that the largest decrease in data occurred at K=3 and after this decrease, the decline began to slope. The rand index is a method used to compare several cluster methods. If the value is >= 90 it is a very good result, if the value is in the range 80 to 90 it identifies a good index, whereas if it is below 80 it indicates a bad index. The results show that cluster three is verified as the best cluster with a value of 1, followed by a second alternative with cluster 2 of 0.9182. From several validation and evaluation methods it can be concluded that the best grouping can be done using 3 clusters. The results of the study yielded a value of 75.4% in low areas, 21.1% in moderate areas, and 3.5% in severe areas.

Keywords