Applied Sciences (Aug 2024)

Orthogonal Matrix-Autoencoder-Based Encoding Method for Unordered Multi-Categorical Variables with Application to Neural Network Target Prediction Problems

  • Yiying Wang,
  • Jinghua Li,
  • Boxin Yang,
  • Dening Song,
  • Lei Zhou

DOI
https://doi.org/10.3390/app14177466
Journal volume & issue
Vol. 14, no. 17
p. 7466

Abstract

Read online

Neural network models, such as BP, LSTM, etc., support only numerical inputs, so data preprocessing needs to be carried out on the categorical variables to convert them into numerical data. For unordered multi-categorical variables, existing encoding methods may produce dimensional catastrophes and may also introduce additional order misrepresentation and distance bias in neural network computation. To solve the above problems, this paper proposes an unordered multi-categorical variable encoding method O-AE using orthogonal matrix for encoding and encoding representation learning and dimensionality reduction via an autoencoder. Bayesian optimization is used for hyperparameter optimization of the autoencoder. Finally, seven experiments were designed with the basic O-AE, Bayesian optimization of the hyperparameters of the autoencoder for O-AE, and other encoding methods to encode unordered multi-categorical variables in five datasets, and they were input into a BP neural network to carry out target prediction experiments. The results show that the experiments using O-AE and O-AE-b have better prediction results, proving that the method proposed in this paper is highly feasible and applicable and can be an optional method for the data processing of unordered multi-categorical variables.

Keywords