Electronics Letters (Oct 2024)
An evolutionary algorithm‐based classification method for high‐dimensional imbalanced mixed data with missing information
Abstract
Abstract The data scale keeps growing by leaps and the majority of it is high‐dimensional imbalanced data, which is hard to classify. Data missing often happens in software which further aggravates the difficulty of classifying the data. In order to resolve high‐dimensional imbalanced mixed‐variables missing data classification problem, a novel method based on particle swarm optimization is developed. It has three original components including multiple feature selection, mixed attribute imputation, and quantum oversampling. Multiple feature selection uses a two‐stage strategy to obtain stable relevant features. Mixed attribute imputation separates features into continuous and discrete features and fills missing values with different models. Quantum oversampling chooses instances to balance data based on the quantum operator. Furthermore, particle swarm optimization is employed to optimize the parameters of the components to obtain preferable classification results. Six representative classification datasets, six typical algorithms, and four measures are taken to conduct exhaust experiments, and results indicate that the proposed method is superior to the comparison algorithms.
Keywords