Big data clustering with varied density based on MapReduce

Safanaz Heidari; Mahmood Alborzi; Reza Radfar; Mohammad Ali Afsharkazemi; Ali Rajabzadeh Ghatari

doi:10.1186/s40537-019-0236-x

Journal of Big Data (Aug 2019)

Big data clustering with varied density based on MapReduce

Safanaz Heidari,
Mahmood Alborzi,
Reza Radfar,
Mohammad Ali Afsharkazemi,
Ali Rajabzadeh Ghatari

Affiliations

Safanaz Heidari: Department of Information Technology Management, Science and Research Branch, Islamic Azad University
Mahmood Alborzi: Department of Information Technology Management, Science and Research Branch, Islamic Azad University
Reza Radfar: Department of Information Technology Management, Science and Research Branch, Islamic Azad University
Mohammad Ali Afsharkazemi: Department of Industrial Management, Central Tehran Branch, Islamic Azad University
Ali Rajabzadeh Ghatari: Department of Management, Tarbiat Modares University

DOI: https://doi.org/10.1186/s40537-019-0236-x
Journal volume & issue: Vol. 6, no. 1
pp. 1 – 16

Abstract

Read online

Abstract The DBSCAN algorithm is a prevalent method of density-based clustering algorithms, the most important feature of which is the ability to detect arbitrary shapes and varied clusters and noise data. Nevertheless, this algorithm faces a number of challenges, including failure to find clusters of varied densities. On the other hand, with the rapid development of the information age, plenty of data are produced every day, such that a single machine alone cannot process this volume of data; hence, new technologies are required to store and extract information from this volume of data. A large volume of data that is beyond the capabilities of existing software is called Big data. In this paper, we have attempted to introduce a new algorithm for clustering big data with varied density using a Hadoop platform running MapReduce. The main idea of this research is the use of local density to find each point’s density. This strategy can avoid the situation of connecting clusters with varying densities. The proposed algorithm is implemented and compared with other algorithms using the MapReduce paradigm and shows the best varying density clustering capability and scalability.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords