Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem

Hayder Mohammed Qasim; Oguz Ata; Mohammad Azam Ansari; Mohammad N. Alomary; Saad Alghamdi; Mazen Almehmadi

doi:10.3390/medicina57111217

Medicina (Nov 2021)

Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem

Hayder Mohammed Qasim,
Oguz Ata,
Mohammad Azam Ansari,
Mohammad N. Alomary,
Saad Alghamdi,
Mazen Almehmadi

Affiliations

Hayder Mohammed Qasim: Department of Electrical and Computer Engineering, Institute of Science, Altinbas University, Istanbul 34218, Turkey
Oguz Ata: Department of Electrical and Computer Engineering, Institute of Science, Altinbas University, Istanbul 34218, Turkey
Mohammad Azam Ansari: Department of Epidemic Disease Research, Institute for Research & Medical Consultations (IRMC), Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia
Mohammad N. Alomary: National Centre for Biotechnology, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia
Saad Alghamdi: Laboratory Medicine Department, Faculty of Applied Medical Sciences, Umm Al-Qura University, Makkah 24382, Saudi Arabia
Mazen Almehmadi: Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Taif University, Taif 21944, Saudi Arabia

DOI: https://doi.org/10.3390/medicina57111217
Journal volume & issue: Vol. 57, no. 11
p. 1217

Abstract

Read online

Background and Objectives: Recently, many studies have focused on the early detection of Parkinson’s disease (PD). This disease belongs to a group of neurological problems that immediately affect brain cells and influence the movement, hearing, and various cognitive functions. Medical data sets are often not equally distributed in their classes and this gives a bias in the classification of patients. We performed a Hybrid feature selection framework that can deal with imbalanced datasets like PD. Use the SOMTE algorithm to deal with unbalanced datasets. Removing the contradiction from the features in the dataset and decrease the processing time by using Recursive Feature Elimination (RFE), and Principle Component Analysis (PCA). Materials and Methods: PD acoustic datasets and the characteristics of control subjects were used to construct classification models such as Bagging, K-nearest neighbour (KNN), multilayer perceptron, and the support vector machine (SVM). In the prepressing stage, the synthetic minority over-sampling technique (SMOTE) with two-feature selection RFE and PCA were used. The PD dataset comprises a large difference between the numbers of the infected and uninfected patients, which causes the classification bias problem. Therefore, SMOTE was used to resolve this problem. Results: For model evaluation, the train–test split technique was used for the experiment. All the models were Grid-search tuned, the evaluation results of the SVM model showed the highest accuracy of 98.2%, and the KNN model exhibited the highest specificity of 99%. Conclusions: the proposed method is compared with the current modern methods of detecting Parkinson’s disease and other methods for medical diseases, it was noted that our developed system could treat data bias and reach a high prediction of PD and this can be beneficial for health organizations to properly prioritize assets.

Published in Medicina

ISSN: 1010-660X (Print); 1648-9144 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine: Medicine (General)
Website: https://www.mdpi.com/journal/medicina

About the journal

Abstract

Keywords