Alexandria Engineering Journal (Mar 2023)

Relevance-diversity algorithm for feature selection and modified Bayes for prediction

  • M. Shaheen,
  • N. Naheed,
  • A. Ahsan

Journal volume & issue
Vol. 66
pp. 329 – 342

Abstract

Read online

Big data analytics uncovers hidden patterns through classification, prediction and reinforcement of big datasets. In these datasets, some features have a negligible connection with other features and some may be insignificant as their presence does not impact the results of big data analytics. The algorithms of big data analytics generate better classification models when supplied with a dataset consisting of relevant, important and informative features. These features can be classified as important and unimportant. For the selection of important features, different filtrations techniques are used. These techniques filter features on different basis like information gain, information dispersion, Gini index, etc. and have a few drawbacks reviewed in this paper. The first contribution of this paper is to propose a new feature selection technique named “Relevance-diversity algorithm” for selecting important features based on two measures i.e. relevance and diversity for optimizing features as low as possible and reducing the search time used in feature selection. The second contribution of the paper is that it proposes a new supervised classification algorithm based on Naive Bayes classification. The assumption of naive i.e. feature independence is discarded from the algorithm of Naive Bayes classification. The features are considered to be dependent on each other and their combined impact on the class value is evaluated. The newly proposed classification algorithm is then applied to the features selected through the relevance-diversity based feature selection technique. The datasets of Weather, Tic-Tac-Toe, Lenses, Balance-scale and CarEval are used for the evaluation of both the techniques. The results of the proposed feature selection method are compared with the existing methods and the results of Modified-Bayes are compared with the existing Naive Bayes algorithm. Analysis revealed that the proposed method performed better in terms of the number of features, accuracy and time complexity.

Keywords