IEEE Access (Jan 2022)
An Integrated Novel Framework for Coping Missing Values Imputation and Classification
Abstract
This work presents an integrated framework for imputation of missing values and prediction of class label of unseen samples by using the best features of rule based inductive decision tree (DT) and Support Vector Machine (SVM) classifier (DT-SVM). In this work, the decision tree is used for imputation of missing values of the datasets containing both categorical and numerical valued attributes. In addition, some of the other popular and simple missing value imputation techniques like drop, mean, median, mode, and k-nearest neighbor (kNN) are used for a comparative analysis. The imputed datasets are then classified using SVM. The performance of the proposed integrated novel framework DT-SVM has been compared with Drop-SVM, Mean-SVM, Median-SVM, Mode-SVM, and kNN-SVM and it is found that DT-SVM outperforms others. Further, a new variant of kNN named it as approximated kNN (A-kNN) has been proposed to overcome some of the shortcomings of canonical kNN while learning from a training set imputed by DT. Unlike canonical kNN, A-kNN does not scan the entire training set. Instead, it processes some of the representative instances from the training dataset to identify the nearest neighbor. The class centroid approach is adopted to find the representative instances of the training set. The effectiveness in term of accuracy as well as computational time of A-kNN is examined by comparing with canonical kNN. It is found that computational time of the proposed A-kNN is drastically reduced as compared to canonical kNN without compromising with the classification accuracy.
Keywords