Earthquake Science (Apr 2024)

CREDIT-X1local: A reference dataset for machine learning seismology from ChinArray in Southwest ChinaKey points

  • Lu Li,
  • Weitao Wang,
  • Ziye Yu,
  • Yini Chen

Journal volume & issue
Vol. 37, no. 2
pp. 139 – 157

Abstract

Read online

High-quality datasets are critical for the development of advanced machine-learning algorithms in seismology. Here, we present an earthquake dataset based on the ChinArray Phase I records (X1). ChinArray Phase I was deployed in the southern north-south seismic zone (20° N–32° N, 95° E–110° E) in 2011–2013 using 355 portable broadband seismic stations. CREDIT-X1local, the first release of the ChinArray Reference Earthquake Dataset for Innovative Techniques (CREDIT), includes comprehensive information for the 105,455 local events that occurred in the southern north-south seismic zone during array observation, incorporating them into a single HDF5 file. Original 100-Hz sampled three-component waveforms are organized by event for stations within epicenter distances of 1,000 km, and records of ≥ 200 s are included for each waveform. Two types of phase labels are provided. The first includes manually picked labels for 5,999 events with magnitudes ≥ 2.0, providing 66,507 Pg, 42,310 Sg, 12,823 Pn, and 546 Sn phases. The second contains automatically labeled phases for 105,442 events with magnitudes of −1.6 to 7.6. These phases were picked using a recurrent neural network phase picker and screened using the corresponding travel time curves, resulting in 1,179,808 Pg, 884,281 Sg, 176,089 Pn, and 22,986 Sn phases. Additionally, first-motion polarities are included for 31,273 Pg phases. The event and station locations are provided, so that deep learning networks for both conventional phase picking and phase association can be trained and validated. The CREDIT-X1local dataset is the first million-scale dataset constructed from a dense seismic array, which is designed to support various multi-station deep-learning methods, high-precision focal mechanism inversion, and seismic tomography studies. Additionally, owing to the high seismicity in the southern north-south seismic zone in China, this dataset has great potential for future scientific discoveries.

Keywords