Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Aug 2023)

Antlion Optimizer Algorithm Modification for Initial Centroid Determination in K-means Algorithm

  • Nanang Lestio Wibowo,
  • Moch Arief Soeleman,
  • Ahmad Zainul Fanani

DOI
https://doi.org/10.29207/resti.v7i4.4997
Journal volume & issue
Vol. 7, no. 4
pp. 870 – 883

Abstract

Read online

Clustering is a grouping of data used in data mining processing. K-means is one of the popular clustering algorithms, is easy to use, and is fast in clustering data. The K-means method groups the data based on k distances and randomly determines the initial centroid as a reference for processing. Careless selection of centroids can result in poor clustering processes and local optima. One of the improvements in determining the initial centroid on the k-means method is to use the optimization method to determine the initial centroid. The modified Antlion Optimizer (ALO) method is used to improve poor clustering in the initial centroid determination and as an alternative to determining the initial centroid in the k-means method for better clustering results. The results of the research on the use of the proposed method for determining the initial centroid provide an increase in clustering compared to the usual k-means and k-means++ methods. This is evidenced by the evaluation of the sum of intragroup distance (SICD) with UCI datasets, namely iris, wine, glass, ecoli, and cancer, in each method, the best SICD value was obtained in the proposed method. Then measuring the best SICD value for each method and dataset is measured by providing a ranking proving that the proposed method on the iris, wine, and cancer datasets gets the first rank, and on the ecoli and glass datasets the proposed method and the k-means++ method both get the first rank. From the average ranking value, the proposed method is ranked first, which provides evidence that the proposed method can improve the clustering results and can be an alternative method for determining the initial center of a cluster using the k-means method.

Keywords