Soils and Foundations (Dec 2024)
Review and comparison of machine learning methods in developing optimal models for predicting geotechnical properties with consideration of feature selection
Abstract
Geotechnical properties, such as cohesion, pile drivability, rock strength, is one of the most important and indispensable input for design or analysis of geotechnical/geological engineering projects. Conventionally, these properties are obtained from laboratory experiments with well-prepared samples or well-designed experiments in-situ. Although direct measurements are generally accurate, they are often time-consuming and laborious, and acquisition of numerous measurements is often not available. This is especially true for medium- or small-sized projects. Alternatively, the properties of interest can be predicted from readily available indices by some machine learning (ML) methods, which has been applied to geotechnical engineering increasingly in recent years. Although ML methods perform reasonably well in predicting target geotechnical properties, all features considered subjectively relevant were often taken as input to the developed model. However, not all features contribute equally significant to the prediction. Involvement of irrelevant indices in an ML model would increase the model complexity, add additional difficulty in result interpretation, and introduce a risk of degrading the model’s generalization ability. Although these points have been well recognized in literature, only few studies carried out feature selection when ML methods are applied to geotechnical/geological engineering. This paper aims to alleviate this gap by offering a comprehensive review and comparison of commonly used ML methods, with consideration of various methods for feature selection. Selection of relevant features for the problem at hand also agrees well with the spirit of “data first practice central agenda” in data-centric geotechnics. Both simulated and real-life datasets are used to compare performance of the various ML methods in feature selection and prediction. Results show that fully Bayesian-Gaussian process regression (fB-GPR) outperforms other ML models.