Data Science Journal (Apr 2009)
An Improved Correlation-Based Algorithm with Discretization for Attribute Reduction in Data Clustering
Abstract
Attribute reduction aims to reduce the dimensionality of large scale data without losing useful information and is an important topic of knowledge discovery, data clustering, and classification. In this paper, we aim to solve the current problem that a continuous attribute in a clustering or classification algorithm must be made discrete. We propose a new algorithm of data reduction based on a correlation model with data discretization. It deals with selection of continuous attributes from a very large set of attributes. The proposed algorithm is an extended version of the Fast Correlation-based filter algorithm and is named FCBF+. The FCBF+ algorithm performs the discretization of continuous attributes in an efficient manner. Then it selects the relevant attributes from a very large set of attributes. Performance evaluation is done on clustering accuracy for all the features, and a reduced set of features is obtained using FCBF+. It is found that the proposed FCBF+ algorithm improves the clustering accuracy of various clustering algorithms.
Keywords