Machine Learning and Knowledge Extraction (Apr 2024)

Effective Data Reduction Using Discriminative Feature Selection Based on Principal Component Analysis

  • Faith Nwokoma,
  • Justin Foreman,
  • Cajetan M. Akujuobi

DOI
https://doi.org/10.3390/make6020037
Journal volume & issue
Vol. 6, no. 2
pp. 789 – 799

Abstract

Read online

Effective data reduction must retain the greatest possible amount of informative content of the data under examination. Feature selection is the default for dimensionality reduction, as the relevant features of a dataset are usually retained through this method. In this study, we used unsupervised learning to discover the top-k discriminative features present in the large multivariate IoT dataset used. We used the statistics of principal component analysis to filter the relevant features based on the ranks of the features along the principal directions while also considering the coefficients of the components. The selected number of principal components was used to decide the number of features to be selected in the SVD process. A number of experiments were conducted using different benchmark datasets, and the effectiveness of the proposed method was evaluated based on the reconstruction error. The potency of the results was verified by subjecting the algorithm to a large IoT dataset, and we compared the performance based on accuracy and reconstruction error to the results of the benchmark datasets. The performance evaluation showed consistency with the results obtained with the benchmark datasets, which were of high accuracy and low reconstruction error.

Keywords