Comparison of Distributed K-Means and Distributed Fuzzy C-Means Algorithms for Text Clustering

I Made Artha Agastya; Teguh Bharata Adji; Noor Akhmad Setiawan

doi:10.21924/cst.2.1.2017.46

Communications in Science and Technology (Jun 2017)

Comparison of Distributed K-Means and Distributed Fuzzy C-Means Algorithms for Text Clustering

I Made Artha Agastya ,
Teguh Bharata Adji ,
Noor Akhmad Setiawan

Affiliations

I Made Artha Agastya: Department of Electrical Engineering and Information Technology, Faculty of Engineering, Gadjah Mada University
Teguh Bharata Adji: Department of Electrical Engineering and Information Technology, Faculty of Engineering, Gadjah Mada University
Noor Akhmad Setiawan: Department of Electrical Engineering and Information Technology, Faculty of Engineering, Gadjah Mada University

DOI: https://doi.org/10.21924/cst.2.1.2017.46
Journal volume & issue: Vol. 2, no. 1
pp. 11 – 17

Abstract

Read online

Text clustering has been developed in distributed system due to increasing data. The popular algorithms like K-Means (KM) and Fuzzy C-Means (FCM) are combined with MapReduce algorithm in Hadoop Environment to be distributable and parallelizable. The problem is performance comparison between Distributed KM (DKM) and Distributed FCM (DFCM) that use Tanimoto Distance Measure (TDM) has not been studied yet. It is important because TDM’s characteristics are scale invariant while allowing discrimination collinear vectors. This work compared the combination of TDM with DKM (DKM-T) and TDM with DFCM (DFCM-T) to acquire performance of both algorithms. The result shows that DFCM-T has better intra-cluster and inter-cluster densities than those of DKM-T. Moreover, DFCM-T has lower processing time than that of DKM-T when total nodes used are 4 and 8. DFCM-T and DKM-T could perform clustering of 1,400,000 text files in 16.18 and 9.74 minutes but the preprocessing times take hours.

Published in Communications in Science and Technology

ISSN: 2502-9258 (Print); 2502-9266 (Online)
Publisher: Komunitas Ilmuwan dan Profesional Muslim Indonesia
Country of publisher: Indonesia
LCC subjects: Science: Science (General); Social Sciences: Social sciences (General)
Website: http://cst.kipmi.or.id/

About the journal

Abstract

Keywords