Application of distributed techniques in large language model training and inference

ZHENG Weimin

大数据 (Sep 2024)

Application of distributed techniques in large language model training and inference

ZHENG Weimin

Affiliations

ZHENG Weimin

Abstract

Read online

In recent years, artificial intelligence has been widely applied in multiple fields, and the "pre-training and fine-tuning" of large models (LLMs) has become the latest paradigm of artificial intelligence. Distributed technology exists at every stage of the lifecycle of LLMs, providing support for them. In the data acquisition process, the file system called "SuperFS", was developed to address the storage issue of massive small files, which can meet the requirements of low latency and scalability. In the data preprocessing stage, an efficient big data processing engine called "Chukonu" was developed to address the issue of high overhead in reading data from distributed file systems. In the model training stage, a distributed checkpoint strategy was proposed to address the problem of poor read and write performance of checkpoint files, greatly improving the read and write speed of checkpoint files. In the model inference stage, a high-throughput inference scheme called "FastDecode" and a LLM inference architecture called "Mooncake" were developed to address the challenge posed by KVCache to storage system. The applications of distributed technology enable LLMs to fully utilize computing resources, accelerate training speed, and benefit the development of the field of artificial intelligence.

distributed technology;large language model;massive small files;big data processing engine;checkpoint;KVCache

Published in 大数据

ISSN: 2096-0271 (Print)
Publisher: China InfoCom Media Group
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.infocomm-journal.com/bdr/EN/2096-0271/home.shtml

About the journal

Abstract

Keywords