Indonesian Journal of Data and Science (Dec 2024)
Grid Search Hyperparameter Analysis in Optimizing The Decision Tree Method for Diabetes Prediction
Abstract
Diabetes is a global health issue that continues to rise, especially in Indonesia, caused by unhealthy lifestyles, poor diets, and genetic factors. Early detection of diabetes risk is crucial to prevent serious complications, and machine learning offers innovative predictive solutions. This research focuses on the development of a diabetes risk prediction model using the Decision Tree algorithm with hyperparameter optimization through the Grid Search technique. The research methodology includes the collection of patient medical data with key attributes such as glucose levels, blood pressure, skin health, insulin, body mass index (BMI), diabetes pedigree, age, and health history. The hyperparameter tuning process is carried out by varying key parameters such as the maximum tree depth (max_depth), the minimum number of samples required to split a node (min_samples_split), and the minimum number of samples required at a leaf node (min_samples_leaf). Grid Search is used to systematically explore hyperparameter combinations in order to find the optimal configuration that can improve the model's performance. The research process includes data preprocessing, splitting the dataset into training and testing sets, model training, and evaluation using accuracy metrics, confusion matrix, and ROC AUC curve. The initial results show a model accuracy of 76%, which was then improved to 81% after hyperparameter optimization using Grid Search. The visualization of the decision tree reveals that glucose levels and BMI have the most significant contributions in predicting diabetes risk. This research demonstrates the potential of machine learning in supporting the early detection of diabetes, with the Decision Tree algorithm showing promising predictive capabilities. Nevertheless, further research with larger datasets and the integration of other algorithms is highly recommended to improve the accuracy and generalization of the model. The main contribution of this research is the development of a machine learning-based approach that can assist medical personnel in screening for diabetes risk more efficiently and accurately.
Keywords