Model Selection Using <i>K</i>-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets

Ishfaq Ali; Atiq Ur Rehman; Dost Muhammad Khan; Zardad Khan; Muhammad Shafiq; Jin-Ghoo Choi

doi:10.3390/sym14061149

Symmetry (Jun 2022)

Model Selection Using <i>K</i>-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets

Ishfaq Ali,
Atiq Ur Rehman,
Dost Muhammad Khan,
Zardad Khan,
Muhammad Shafiq,
Jin-Ghoo Choi

Affiliations

Ishfaq Ali: Department of Statistics, Abdul Wali Khan University, Mardan 23200, Pakistan
Atiq Ur Rehman: Department of Mathematics and Statistics, Faculty of Basic and Applied Sciences, International Islamic University, Islamabad 44000, Pakistan
Dost Muhammad Khan: Department of Statistics, Abdul Wali Khan University, Mardan 23200, Pakistan
Zardad Khan: Department of Statistics, Abdul Wali Khan University, Mardan 23200, Pakistan
Muhammad Shafiq: Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Korea
Jin-Ghoo Choi: Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Korea

DOI: https://doi.org/10.3390/sym14061149
Journal volume & issue: Vol. 14, no. 6
p. 1149

Abstract

Read online

The importance of unsupervised clustering methods is well established in the statistics and machine learning literature. Many sophisticated unsupervised classification techniques have been made available to deal with a growing number of datasets. Due to its simplicity and efficiency in clustering a large dataset, the k-means clustering algorithm is still popular and widely used in the machine learning community. However, as with other clustering methods, it requires one to choose the balanced number of clusters in advance. This paper’s primary emphasis is to develop a novel method for finding the optimum number of clusters, k, using a data-driven approach. Taking into account the cluster symmetry property, the k-means algorithm is applied multiple times to a range of k values within which the balanced optimum k value is expected. This is based on the uniqueness and symmetrical nature among the centroid values for the clusters produced, and we chose the final k value as the one for which symmetry is observed. We evaluated the proposed algorithm’s performance on different simulated datasets with controlled parameters and also on real datasets taken from the UCI machine learning repository. We also evaluated the performance of the proposed method with the aim of remote sensing, such as in deforestation and urbanization, using satellite images of the Islamabad region in Pakistan, taken from the Sentinel-2B satellite of the United States Geological Survey. From the experimental results and real data analysis, it is concluded that the proposed algorithm has better accuracy and minimum root mean square error than the existing methods.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords