Machine Learning-Based Approach for Predicting Diabetes Employing Socio-Demographic Characteristics
Md. Ashikur Rahman,
Lway Faisal Abdulrazak,
Md. Mamun Ali,
Imran Mahmud,
Kawsar Ahmed,
Francis M. Bui
Affiliations
Md. Ashikur Rahman
Department of Software Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka 1216, Bangladesh
Lway Faisal Abdulrazak
Department of Computer Science, Cihan University Sulaimaniya, Kurdistan Region, Sulaimaniya 46001, Iraq
Md. Mamun Ali
Department of Software Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka 1216, Bangladesh
Imran Mahmud
Department of Software Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka 1216, Bangladesh
Kawsar Ahmed
Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka 1216, Bangladesh
Francis M. Bui
Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK S7N 5A9, Canada
Diabetes is one of the fatal diseases that play a vital role in the growth of other diseases in the human body. From a clinical perspective, the most significant approach to mitigating the effects of diabetes is early-stage control and management, with the aim of a potential cure. However, lack of awareness and expensive clinical tests are the primary reasons why clinical diagnosis and preventive measures are neglected in lower-income countries like Bangladesh, Pakistan, and India. From this perspective, this study aims to build an automated machine learning (ML) model, which will predict diabetes at an early stage using socio-demographic characteristics rather than clinical attributes, due to the fact that clinical features are not always accessible to all people from lower-income countries. To find the best fit of the supervised ML classifier of the model, we applied six classification algorithms and found that RF outperformed with an accuracy of 99.36%. In addition, the most significant risk factors were found based on the SHAP value by all the applied classifiers. This study reveals that polyuria, polydipsia, and delayed healing are the most significant risk factors for developing diabetes. The findings indicate that the proposed model is highly capable of predicting diabetes in the early stages.