Efficient algorithm for big data clustering on single machine

Rasim M. Alguliyev; Ramiz M. Aliguliyev; Lyudmila V. Sukhostat; Lyudmila V. Sukhostat

doi:10.1049/trit.2019.0048

CAAI Transactions on Intelligence Technology (Nov 2019)

Efficient algorithm for big data clustering on single machine

Rasim M. Alguliyev,
Ramiz M. Aliguliyev,
Lyudmila V. Sukhostat,
Lyudmila V. Sukhostat,

Affiliations

Rasim M. Alguliyev: Institute of Information Technology, Azerbaijan National Academy of Sciences
Ramiz M. Aliguliyev: Institute of Information Technology, Azerbaijan National Academy of Sciences
Lyudmila V. Sukhostat: Institute of Information Technology, Azerbaijan National Academy of Sciences
Lyudmila V. Sukhostat: Institute of Information Technology, Azerbaijan National Academy of Sciences

DOI: https://doi.org/10.1049/trit.2019.0048

Abstract

Read online

Big data analysis requires the presence of large computing powers, which is not always feasible. And so, it became necessary to develop new clustering algorithms capable of such data processing. This study proposes a new parallel clustering algorithm based on the k-means algorithm. It significantly reduces the exponential growth of computations. The proposed algorithm splits a dataset into batches while preserving the characteristics of the initial dataset and increasing the clustering speed. The idea is to define cluster centroids, which are also clustered, for each batch. According to the obtained centroids, the data points belong to the cluster with the nearest centroid. Real large datasets are used to conduct the experiments to evaluate the effectiveness of the proposed approach. The proposed approach is compared with k-means and its modification. The experiments show that the proposed algorithm is a promising tool for clustering large datasets in comparison with the k-means algorithm.

Published in CAAI Transactions on Intelligence Technology

ISSN: 2468-2322 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/24682322

About the journal

Abstract

Keywords