ISPRS International Journal of Geo-Information (Dec 2023)
Machine-Learning-Based Forest Classification and Regression (FCR) for Spatial Prediction of Liver Fluke <i>Opisthorchis viverrini</i> (<i>OV</i>) Infection in Small Sub-Watersheds
Abstract
Infection of liver flukes (Opisthorchis viverrini) is partly due to their suitability for habitats in sub-basin areas, which causes the intermediate host to remain in the watershed system in all seasons. The spatial monitoring of fluke at the small basin scale is important because this can enable analysis at the level of the factors involved that influence infections. A spatial mathematical model was weighted by the nine spatial factors X1 (index of land-use types), X2 (index of soil drainage properties), X3 (distance index from the road network, X4 (distance index from surface water resources), X5 (distance index from the flow accumulation lines), X6 (index of average surface temperature), X7 (average surface moisture index), X8 (average normalized difference vegetation index), and X9 (average soil-adjusted vegetation index) by dividing the analysis into two steps: (1) the sub-basin boundary level was analyzed with an ordinary least square (OLS) model used to select the spatial criteria of liver flukes aimed at analyzing the factors related to human liver fluke infection according to sub-watersheds, and (2) we used the infection risk positional analysis level through machine-learning-based forest classification and regression (FCR) to display the predictive results of infection risk locations along stream lines. The analysis results show four prototype models that import different independent variable factors. The results show that Model 1 and Model 2 gave the most AUC (0.964), and the variables that influenced infection risk the most were the distance to stream lines and the distance to water bodies; the NDMI and NDVI factors rarely affected the accuracy. This FCR machine-learning application approach can be applied to the analysis of infection risk areas at the sub-basin level, but independent variables must be screened with a preliminary mathematical model weighted to the spatial units in order to obtain the most accurate predictions.
Keywords