Geoscientific Model Development (May 2021)

The GPU version of LASG/IAP Climate System Ocean Model version 3 (LICOM3) under the heterogeneous-compute interface for portability (HIP) framework and its large-scale application

  • P. Wang,
  • P. Wang,
  • J. Jiang,
  • J. Jiang,
  • P. Lin,
  • P. Lin,
  • M. Ding,
  • J. Wei,
  • F. Zhang,
  • L. Zhao,
  • Y. Li,
  • Z. Yu,
  • W. Zheng,
  • W. Zheng,
  • Y. Yu,
  • Y. Yu,
  • X. Chi,
  • X. Chi,
  • H. Liu,
  • H. Liu

DOI
https://doi.org/10.5194/gmd-14-2781-2021
Journal volume & issue
Vol. 14
pp. 2781 – 2799

Abstract

Read online

A high-resolution (1/20∘) global ocean general circulation model with graphics processing unit (GPU) code implementations is developed based on the LASG/IAP Climate System Ocean Model version 3 (LICOM3) under a heterogeneous-compute interface for portability (HIP) framework. The dynamic core and physics package of LICOM3 are both ported to the GPU, and three-dimensional parallelization (also partitioned in the vertical direction) is applied. The HIP version of LICOM3 (LICOM3-HIP) is 42 times faster than the same number of CPU cores when 384 AMD GPUs and CPU cores are used. LICOM3-HIP has excellent scalability; it can still obtain a speedup of more than 4 on 9216 GPUs compared to 384 GPUs. In this phase, we successfully performed a test of 1/20∘ LICOM3-HIP using 6550 nodes and 26 200 GPUs, and on a large scale, the model's speed was increased to approximately 2.72 simulated years per day (SYPD). By putting almost all the computation processes inside GPUs, the time cost of data transfer between CPUs and GPUs was reduced, resulting in high performance. Simultaneously, a 14-year spin-up integration following phase 2 of the Ocean Model Intercomparison Project (OMIP-2) protocol of surface forcing was performed, and preliminary results were evaluated. We found that the model results had little difference from the CPU version. Further comparison with observations and lower-resolution LICOM3 results suggests that the 1/20∘ LICOM3-HIP can reproduce the observations and produce many smaller-scale activities, such as submesoscale eddies and frontal-scale structures.