Jurnal Natural (Oct 2023)

Application of SHAP on CatBoost classification for identification of variabels characterizing food insecurity occurrences in Aceh Province households

  • MUHAMMAD SUBIANTO,
  • INA YATUL ULYA,
  • EVI RAMADHANI,
  • BAGUS SARTONO,
  • ALFIAN FUTUHUL HADI

DOI
https://doi.org/10.24815/jn.v23i3.33548
Journal volume & issue
Vol. 23, no. 3
pp. 230 – 244

Abstract

Read online

Classification is the process of building a model that can distinguish between different classes of data. The model aims to predict the class of testing data based on patterns or relationships learned from training data. One of the data processing algorithms used to build classification models is Categorical Boosting (CatBoost). However, in general, the resulting models are difficult to interpret. To facilitate the interpretation of complex classification models, methods such as SHAP (SHapley Additive exPlanations) are needed. SHAP is a method to explain individual predictions. SHAP is based on the game theoretically optimal shapley values. In this study, an analysis of important SHAP variables was conducted on the CatBoost classification model to identify variables characterizing occurrences of food insecurity in households. The data used in this study was obtained from the Survei Sosial Ekonomi Nasional (Susenas) in March 2021 in Aceh Province, sourced from the Badan Pusat Statistik (BPS). There are 13,126 observations in the research data. The results from four evaluated classification models on the testing data showed that the best model had accuracy, sensitivity, specificity, and AUC values of 0.703, 0.349, 0.798, and 0.637, respectively. Furthermore, the results of the analysis of important SHAP variables showed that the variables number of household members who smoke ( ), education of the household head ( ), wall types ( ), drinking water source ( ), and decent sanitation ( ) significantly contributed to the occurrences of food insecurity in households in Aceh Province in the year 2021.

Keywords