Results in Engineering (Dec 2024)
Data augmentation using SMOTE technique: Application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models
Abstract
Accurate burst pressure prediction is critical for ensuring oil and gas pipeline safety, guiding maintenance decisions, and lowering costs and risks. Traditional methods have limitations, including high experimental costs, conservative empirical models, and computationally expensive numerical algorithms. Machine learning (ML) models have supplanted traditional methods in recent years. However, small and imbalanced datasets are the big challenge to build a ML model that can generate more accurate results. Moreover, the lack of generalization in ML models trained on a dataset of pipelines with specific material grids prevents them from producing superior results on other pipeline types. First, FEA was used to make a dataset. Then, a new way to improve machine learning (ML) model generalization for burst pressure prediction is suggested: combine publicly available datasets of different pipeline specifications. In this combined dataset, some pipelines have a higher number of data samples, and some have fewer, which causes a class imbalance issue. The Synthetic Minority Oversampling Technique (SMOTE) technique was applied to address the issue of class imbalance. The performance of various ML models, Extra Trees (ET), Extreme Gradient Boosting (XGBR), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Decision Tree (DT), was evaluated to validate the model's prediction and generalization on pipelines of various material grids. Results show that all the selected ML models produced high R-squared, i.e., >0.95, on balanced data compared to the imbalance dataset. These results show that SMOTE-based augmentation is a beneficial way to fix dataset imbalance and make ML models better at predicting burst pressure in oil and gas pipelines.