An empirical study of large-scale data-driven full waveform inversion

Peng Jin; Yinan Feng; Shihang Feng; Hanchen Wang; Yinpeng Chen; Benjamin Consolvo; Zicheng Liu; Youzuo Lin

doi:10.1038/s41598-024-68573-7

Scientific Reports (Aug 2024)

An empirical study of large-scale data-driven full waveform inversion

Peng Jin,
Yinan Feng,
Shihang Feng,
Hanchen Wang,
Yinpeng Chen,
Benjamin Consolvo,
Zicheng Liu,
Youzuo Lin

Affiliations

Peng Jin: Earth and Environmental Sciences Division, Los Alamos National Laboratory
Yinan Feng: Earth and Environmental Sciences Division, Los Alamos National Laboratory
Shihang Feng: Earth and Environmental Sciences Division, Los Alamos National Laboratory
Hanchen Wang: Earth and Environmental Sciences Division, Los Alamos National Laboratory
Yinpeng Chen: Google Research
Benjamin Consolvo: Intel Corporation
Zicheng Liu: Microsoft
Youzuo Lin: School of Data Science and Society, The University of North Carolina at Chapel Hill

DOI: https://doi.org/10.1038/s41598-024-68573-7
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 10

Abstract

Read online

Abstract This paper investigates the impact of big data on deep learning models to help solve the full waveform inversion (FWI) problem. While it is well known that big data can boost the performance of deep learning models in many tasks, its effectiveness has not been validated for FWI. To address this gap, we present an empirical study that investigates how deep learning models in FWI behave when trained on openfwi, a collection of large-scale, multi-structural, synthetic datasets published recently. In particular, we train and evaluate the FWI models on a combination of 10 2D subsets in openfwi that contain 470 K pairs of seismic data and velocity maps in total. Our experiments demonstrate that training on the combined dataset yields an average improvement of 13.03% in MAE, 7.19% in MSE and 1.87% in SSIM compared to each split dataset, and an average improvement of 28.60%, 21.55% and 8.22% in the leave-one-out generalization test. We further demonstrate that model capacity needs to scale in accordance with data size for optimal improvement, where our largest model yields an average improvement of 20.06%, 13.39% and 0.72% compared to the smallest one.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal