MSCH: Microbatch-Based Selective Activation Checkpointing With Recomputation Hidden for Efficient Training of LLM Models

Cheng Zhang; Minjun Yu; Li Yu; Pengyu Cong; Yuchao Yan; Jie Bao; Jian Jiang; Xiaozheng Wang; Xiaolong Ye; Tao Tang; Liang Xiao

doi:10.1109/ACCESS.2024.3456788

IEEE Access (Jan 2024)

MSCH: Microbatch-Based Selective Activation Checkpointing With Recomputation Hidden for Efficient Training of LLM Models

Cheng Zhang,
Minjun Yu,
Li Yu,
Pengyu Cong,
Yuchao Yan,
Jie Bao,
Jian Jiang,
Xiaozheng Wang,
Xiaolong Ye,
Tao Tang,
Liang Xiao

Affiliations

Cheng Zhang: ORCiD; Platform Capability Research and Development Department, China Mobile (Zhejiang) Innovation Research Institute Company Ltd., Hangzhou, Zhejiang, China
Minjun Yu: ORCiD; Platform Capability Research and Development Department, China Mobile (Zhejiang) Innovation Research Institute Company Ltd., Hangzhou, Zhejiang, China
Li Yu: Platform Capability Research and Development Department, China Mobile (Zhejiang) Innovation Research Institute Company Ltd., Hangzhou, Zhejiang, China
Pengyu Cong: Platform Capability Research and Development Department, China Mobile (Zhejiang) Innovation Research Institute Company Ltd., Hangzhou, Zhejiang, China
Yuchao Yan: Platform Capability Research and Development Department, China Mobile (Zhejiang) Innovation Research Institute Company Ltd., Hangzhou, Zhejiang, China
Jie Bao: Platform Capability Research and Development Department, China Mobile (Zhejiang) Innovation Research Institute Company Ltd., Hangzhou, Zhejiang, China
Jian Jiang: Platform Capability Research and Development Department, China Mobile (Zhejiang) Innovation Research Institute Company Ltd., Hangzhou, Zhejiang, China
Xiaozheng Wang: Zhejiang Mobile Communications Company Ltd., Hangzhou, Zhejiang, China
Xiaolong Ye: Platform Capability Research and Development Department, China Mobile (Zhejiang) Innovation Research Institute Company Ltd., Hangzhou, Zhejiang, China
Tao Tang: Zhejiang Mobile Communications Company Ltd., Hangzhou, Zhejiang, China
Liang Xiao: Zhejiang Mobile Communications Company Ltd., Hangzhou, Zhejiang, China

DOI: https://doi.org/10.1109/ACCESS.2024.3456788
Journal volume & issue: Vol. 12
pp. 178460 – 178475

Abstract

Read online

Activation checkpointing is a widely-used technique to reduce GPU memory consumption during model training. While it helps to conserve memory, it introduces additional computational load. Existing solutions such as selective activation checkpointing (SAC) and microbatch-based selective recomputation (MSC) are not always available and effective at improving training efficiency. In this paper, we propose a novel method called microbatch-based selective activation checkpointing with recomputation hidden (MSCH). MSCH provides a more flexible and effective utilization of the remaining GPU memory after deployment of all activation checkpointing. This minimizes the need to recalculate activations from a microbatch perspective. In addition, we first discovered the challenging “bottleneck” effect and “misalignment” phenomenon in pipeline parallelism scheduling. To address this, we designed a novel multi-stage micro-batch recalculation schedule that hiding activation recalculation at each stage by the “bottleneck” stage thereby effectively improves model training efficiency. Our code is available by https://github.com/CSlearnerZM/MSCH-DeepSpeed.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords