AIPerf: Automated Machine Learning as an AI-HPC Benchmark

Zhixiang Ren; Yongheng Liu; Tianhui Shi; Lei Xie; Yue Zhou; Jidong Zhai; Youhui Zhang; Yunquan Zhang; Wenguang Chen

doi:10.26599/BDMA.2021.9020004

Big Data Mining and Analytics (Sep 2021)

AIPerf: Automated Machine Learning as an AI-HPC Benchmark

Zhixiang Ren,
Yongheng Liu,
Tianhui Shi,
Lei Xie,
Yue Zhou,
Jidong Zhai,
Youhui Zhang,
Yunquan Zhang,
Wenguang Chen

Affiliations

Zhixiang Ren: <institution>Peng Cheng National Laboratory</institution>, <city>Shenzhen</city> <postal-code>518000</postal-code>, <country>China</country>
Yongheng Liu: <institution>Peng Cheng National Laboratory</institution>, <city>Shenzhen</city> <postal-code>518000</postal-code>, <country>China</country>
Tianhui Shi: <institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country>
Lei Xie: <institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country>
Yue Zhou: <institution>Peng Cheng National Laboratory</institution>, <city>Shenzhen</city> <postal-code>518000</postal-code>, <country>China</country>
Jidong Zhai: <institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country>
Youhui Zhang: <institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country>
Yunquan Zhang: <institution>Institute of Computing Technology, Chinese Academy of Sciences</institution>, <city>Beijing</city> <postal-code>100086</postal-code>, <country>China</country>
Wenguang Chen: <institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country>

DOI: https://doi.org/10.26599/BDMA.2021.9020004
Journal volume & issue: Vol. 4, no. 3
pp. 208 – 220

Abstract

Read online

The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark’s stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.

Published in Big Data Mining and Analytics

ISSN: 2096-0654 (Print); 2097-406X (Online)
Publisher: Tsinghua University Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8254253

About the journal

Abstract

Keywords