Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost

Jiangtao Sun; Wei Dang; Fengqin Wang; Haikuan Nie; Xiaoliang Wei; Pei Li; Shaohua Zhang; Yubo Feng; Fei Li

doi:10.3390/en16104159

Energies (May 2023)

Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost

Jiangtao Sun,
Wei Dang,
Fengqin Wang,
Haikuan Nie,
Xiaoliang Wei,
Pei Li,
Shaohua Zhang,
Yubo Feng,
Fei Li

Affiliations

Jiangtao Sun: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China
Wei Dang: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China
Fengqin Wang: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China
Haikuan Nie: Petroleum Exploration and Production Research Institute, SINOPEC, Beijing 100083, China
Xiaoliang Wei: Exploration and Development Institute of Shengli Oilfield Company, SINOPEC, Dongying 257000, China
Pei Li: Petroleum Exploration and Production Research Institute, SINOPEC, Beijing 100083, China
Shaohua Zhang: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China
Yubo Feng: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China
Fei Li: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China

DOI: https://doi.org/10.3390/en16104159
Journal volume & issue: Vol. 16, no. 10
p. 4159

Abstract

Read online

The total organic carbon (TOC) content of organic-rich shale is a key parameter in screening for potential source rocks and sweet spots of shale oil/gas. Traditional methods of determining the TOC content, such as the geochemical experiments and the empirical mathematical regression method, are either high cost and low-efficiency, or universally non-applicable and low-accuracy. In this study, we propose three machine learning models of random forest (RF), support vector regression (SVR), and XGBoost to predict the TOC content using well logs, and the performance of each model are compared with the traditional empirical methods. First, the decision tree algorithm is used to identify the optimal set of well logs from a total of 15. Then, 816 data points of well logs and the TOC content data collected from five different shale formations are used to train and test these three models. Finally, the accuracy of three models is validated by predicting the unknown TOC content data from a shale oil well. The results show that the RF model provides the best prediction for the TOC content, with R2 = 0.915, MSE = 0.108, and MAE = 0.252, followed by the XGBoost, while the SVR gives the lowest predictive accuracy. Nevertheless, all three machine learning models outperform the traditional empirical methods such as Schmoker gamma-ray log method, multiple linear regression method and ΔlgR method. Overall, the proposed machine learning models are powerful tools for predicting the TOC content of shale and improving the oil/gas exploration efficiency in a different formation or a different basin.

Published in Energies

ISSN: 1996-1073 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/energies

About the journal

Abstract

Keywords