Journal of Engineering Science and Technology (Nov 2016)

REVIEW ON FEATURE SELECTION TECHNIQUES AND ITS IMPACT FOR EFFECTIVE DATA CLASSIFICATION USING UCI MACHINE LEARNING REPOSITORY DATASET

  • AMARNATH B.,
  • S. APPAVU ALIAS BALAMURUGAN

Journal volume & issue
Vol. 11, no. 11
pp. 1639 – 1646

Abstract

Read online

Feature selection goal is to get rid of redundant and irrelevant features. The problem of feature subset selection is that of finding a subset of the original features of a dataset, such that an induction algorithm run on data containing only selected features makes a classifier to generate with the highest possible accuracy. High dimensional data can contain a high degree of irrelevant and redundant features which may greatly degrade the performance of learning algorithms. The performance of different feature selectors such as CFS, Chi-Square, Information Gain, Gain Ratio, One R and Symmetrical Uncertainty were evaluated on two different popular classification algorithms namely Decision Tree and Naive Bayesian method. A significant improvement in the performance of DT and NB classifier was shown after reducing the number of both irrelevant and redundant features by the use of different feature ranking methods.

Keywords