Geoscientific Model Development (Sep 2024)

Refactoring the elastic–viscous–plastic solver from the sea ice model CICE v6.5.1 for improved performance

  • T. A. S. Rasmussen,
  • J. Poulsen,
  • M. H. Ribergaard,
  • R. Sasanka,
  • A. P. Craig,
  • E. C. Hunke,
  • S. Rethmeier

DOI
https://doi.org/10.5194/gmd-17-6529-2024
Journal volume & issue
Vol. 17
pp. 6529 – 6544

Abstract

Read online

This study focuses on the performance of the elastic–viscous–plastic (EVP) dynamical solver within the sea ice model, CICE v6.5.1. The study has been conducted in two steps. First, the standard EVP solver was extracted from CICE for experiments with refactored versions, which are used for performance testing. Second, one refactored version was integrated and tested in the full CICE model to demonstrate that the new algorithms do not significantly impact the physical results. The study reveals two dominant bottlenecks, namely (1) the number of Message Parsing Interface (MPI) and Open Multi-Processing (OpenMP) synchronization points required for halo exchanges during each time step combined with the irregular domain of active sea ice points and (2) the lack of single-instruction, multiple-data (SIMD) code generation. The standard EVP solver has been refactored based on two generic patterns. The first pattern exposes how general finite differences on masked multi-dimensional arrays can be expressed in order to produce significantly better code generation by changing the memory access pattern from random access to direct access. The second pattern takes an alternative approach to handle static grid properties. The measured single-core performance improvement is more than a factor of 5 compared to the standard implementation. The refactored implementation of strong scales on the Intel® Xeon® Scalable Processors series node until the available bandwidth of the node is used. For the Intel® Xeon® CPU Max series, there is sufficient bandwidth to allow the strong scaling to continue for all the cores on the node, resulting in a single-node improvement factor of 35 over the standard implementation. This study also demonstrates improved performance on GPU processors.