IEEE Access (Jan 2024)
High Accuracy COVID-19 Prediction Using Optimized Union Ensemble Feature Selection Approach
Abstract
Recently, the world has been dealing with a severe outbreak of COVID-19. The rapid transmission of the virus causes mild to severe cases of cough, fever, body aches, organ failures, and death. An increasing number of patients, fewer diagnostic options, and extended waiting periods for test results all put pressure on healthcare systems, increasing the virus’s spread. A concise and accurate automatic diagnosis is crucial to identify infected patients in the early stage. This paper proposes a machine learning-based predictive framework to identify COVID-19 cases from clinical data using an optimized union ensemble feature selection (OUEFS) approach. The OUEFS is based on the union ensemble of the feature subsets obtained through a rigorous feature selection (FS) process. It also involves a performance optimization of the ML classifiers. Initially the OUEFS identified key features from the publicly accessible COVID-19 dataset using FS methods such as Mutual Information Feature Selection (MIFS), Recursive Feature Elimination (RFE), and the RidgeCV. The most important features were selected using Top-k thresholding technique. Then selected subsets of features were integrated using a union ensemble approach where an optimal combination of features with enhanced predictive power is derived. This composite feature set was subsequently utilized for model training and evaluation. The classification was conducted using machine learning algorithms such as linear SVM, gradient boosting (GB), logistic regression (LR), and Adaboost to compare their effectiveness on individual and combined feature subsets. We also conducted a Genetic Algorithm (GA) based hyperparameter optimization (HPO) which further refined our training process and enhanced the accuracy of our proposed approach. Experimental results show that the union ensemble of MIFS and RidgeCV FS techniques and the Adaboost classifier and GA HPO achieved 96.30% accuracy. Our optimized union ensemble approach demonstrated superior performance over previous ensemble-based approaches to predict COVID-19 disease, thus offering a robust tool for early and efficient diagnosis without requiring hospital visits.
Keywords