Computer Methods and Programs in Biomedicine Update (Jan 2023)
Improving SVM performance for type II diabetes prediction with an improved non-linear kernel: Insights from the PIMA dataset
Abstract
Type 2 diabetes is a chronic metabolic disease that affects a significant portion of the worldwide people. Prediction of this disease using different machine learning (ML) based algorithms has gained substantial attention due to its potential for early detection and effective intervention. One of the most powerful ML algorithm support vector machines (SVM) has proven to be effective in a variety of classification tasks, including diabetes prediction. However, the kernel function chosen has a substantial effect on the performance of SVM classifiers. This paper proposes an improved non-linear kernel for the SVM model to enhance Type 2 diabetes classification. The new kernel uses radial basis function (RBF) and RBF city block kernels that enable SVM to learn complex decision boundaries and adapt to the intricacies of the PIMA dataset. The PIMA dataset contains various clinical and demographic features of individuals. To address missing values and outliers, we impute them using the median, ensuring the integrity of the dataset. We tackle the class imbalance issue by leveraging a robust synthetic-based over-sampling approach.A comparative analysis is performed against several existing kernel functions to show that the proposed approach is superior in terms of various prediction evaluation matrices. Our recommended integrated kernel model also showed improved performance (ACC = 85.5, Recall = 87.0, Precision = 83.4, F1 score = 85.2, and AUC = 85.5) when compared to other approaches in the literature. The results of this study indicate that the proposed non-linear kernel in SVM outperforms existing kernel functions for predicting Type 2 diabetes using the PIMA dataset. Furthermore, a simulation study is carried out to robustify the proposed kernel in SVM and perform well. The improved accuracy and robustness of the model suggest its potential utility in clinical settings, aiding in the early identification and management of individuals at risk for developing diabetes.