Applied Sciences (Apr 2023)
Spatial Autocorrelation Incorporated Machine Learning Model for Geotechnical Subsurface Modeling
Abstract
Machine learning models for spatial prediction have been applied in various types of research. However, spatial relation has not been fully considered in modeling, since the Cartesian coordinates of the observed points are directly employed as the location information for machine learning features. This study presents a machine learning modeling process which incorporates spatial autocorrelation for geotechnical subsurface modeling. A new set of features called the Euclidean distance field (EDF) was generated based on the distance between the query points and the observed boreholes in order to incorporate spatial autocorrelation into the machine learning model. Principal component analysis (PCA) was performed to reduce the increasing dimensionality of the dataset caused by the EDF features. Optimized machine learning models based on several popular algorithms (Support Vector Machine, Gaussian Process Regression, Artificial Neural Network, and k-Nearest Neighbor) were employed for predicting several geotechnical information as the targets. The results showed that the optimized machine learning models constructed with the EDF modeling approach generate a slightly lower Root Mean Square Error (RMSE) score compared to the model with the direct XY coordinate approach by 0.041, 0.046, 1.302, and 1.561 for ground surface elevation, groundwater level, SPT-N value, and percent finer than 0.075 mm sieve, respectively. Both modeling approaches performed well for USCS-based soil classification with the EDF model having slightly improved classification accuracy by 0.72%. Furthermore, the model can perform balance multiclass classification as indicated by the >95% precision, recall, f1-score, and balanced accuracy score. These results indicate that spatial autocorrelation has a noticeable effect. Hence, it needs to be considered to improve the overall performance of spatial machine learning modeling. Comparison of geotechnical subsurface predictions generated based on different machine learning algorithms showed that the selection of the best-performing model based only on the lowest prediction error is not appropriate for spatial prediction modeling. Therefore, thorough analysis of the predicted data by visualization is necessary in the selection process for spatial prediction modeling.
Keywords