UHD Journal of Science and Technology (Oct 2022)
Rough Set-Based Feature Selection for Predicting Diabetes Using Logistic Regression with Stochastic Gradient Decent Algorithm
Abstract
Disease prediction and decision-making plays an important role in medical diagnosis. Research has shown that cost of disease prediction and diagnosis can be reduced by applying interdisciplinary approaches. Machine learning and data mining techniques in computer science are proven to have high potentials by interdisciplinary researchers in the field of disease prediction and diagnosis. In this research, a new approach is proposed to predict diabetes in patients. The approach utilizes stochastic gradient descent which is a machine learning technique to perform logistic regression on a dataset. The dataset is populated with eight original variables (features) collected from patients before being diagnosed with diabetes. The features are used as input values in the proposed approach to predict diabetes in the patients. To examine the effect of having the right variable in the process of making predictions, five variables are selected from the dataset based on rough set theory (RST). The proposed approach is applied again but this time on the selected features to predict diabetes in the patients. The results obtained from both applications have been documented and compared as part of the approach evaluations. The results show that the proposed approach improves the accuracy of predicting diabetes when RST is used to select variables for making the prediction. This paper contributes toward the ongoing efforts to find innovative ways to improve the prediction of diabetes in patients.
Keywords