IEEE Access (Jan 2024)

MSCH: Microbatch-Based Selective Activation Checkpointing With Recomputation Hidden for Efficient Training of LLM Models

  • Cheng Zhang,
  • Minjun Yu,
  • Li Yu,
  • Pengyu Cong,
  • Yuchao Yan,
  • Jie Bao,
  • Jian Jiang,
  • Xiaozheng Wang,
  • Xiaolong Ye,
  • Tao Tang,
  • Liang Xiao

DOI
https://doi.org/10.1109/ACCESS.2024.3456788
Journal volume & issue
Vol. 12
pp. 178460 – 178475

Abstract

Read online

Activation checkpointing is a widely-used technique to reduce GPU memory consumption during model training. While it helps to conserve memory, it introduces additional computational load. Existing solutions such as selective activation checkpointing (SAC) and microbatch-based selective recomputation (MSC) are not always available and effective at improving training efficiency. In this paper, we propose a novel method called microbatch-based selective activation checkpointing with recomputation hidden (MSCH). MSCH provides a more flexible and effective utilization of the remaining GPU memory after deployment of all activation checkpointing. This minimizes the need to recalculate activations from a microbatch perspective. In addition, we first discovered the challenging “bottleneck” effect and “misalignment” phenomenon in pipeline parallelism scheduling. To address this, we designed a novel multi-stage micro-batch recalculation schedule that hiding activation recalculation at each stage by the “bottleneck” stage thereby effectively improves model training efficiency. Our code is available by https://github.com/CSlearnerZM/MSCH-DeepSpeed.

Keywords