Knowledge Engineering and Data Science (Jun 2020)

Parallelization of Partitioning Around Medoids (PAM) in K-Medoids Clustering on GPU

  • Adhi Prahara,
  • Dewi Pramudi Ismi,
  • Ahmad Azhari

DOI
https://doi.org/10.17977/um018v3i12020p40-49
Journal volume & issue
Vol. 3, no. 1
pp. 40 – 49

Abstract

Read online

K-medoids clustering is categorized as partitional clustering. K-medoids offers better result when dealing with outliers and arbitrary distance metric also in the situation when the mean or median does not exist within data. However, k-medoids suffers a high computational complexity. Partitioning Around Medoids (PAM) has been developed to improve k-medoids clustering, consists of build and swap steps and uses the entire dataset to find the best potential medoids. Thus, PAM produces better medoids than other algorithms. This research proposes the parallelization of PAM in k-medoids clustering on GPU to reduce computational time at the swap step of PAM. The parallelization scheme utilizes shared memory, reduction algorithm, and optimization of the thread block configuration to maximize the occupancy. Based on the experiment result, the proposed parallelized PAM k-medoids is faster than CPU and Matlab implementation and efficient for large dataset.