MultiPoint: Enabling scalable pre-silicon performance evaluation for multi-task workloads

Chenji Han; Xinyu Li; Feng Xue; Weitong Wang; Yuxuan Wu; Wenxiang Wang; Fuxin Zhang

BenchCouncil Transactions on Benchmarks, Standards and Evaluations (Sep 2024)

MultiPoint: Enabling scalable pre-silicon performance evaluation for multi-task workloads

Chenji Han,
Xinyu Li,
Feng Xue,
Weitong Wang,
Yuxuan Wu,
Wenxiang Wang,
Fuxin Zhang

Affiliations

Chenji Han: SKLP, Institute of Computing Technology, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China; Corresponding author at: University of Chinese Academy of Sciences, Beijing, China.
Xinyu Li: SKLP, Institute of Computing Technology, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Feng Xue: SKLP, Institute of Computing Technology, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Weitong Wang: SKLP, Institute of Computing Technology, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Yuxuan Wu: Loongson Technology, Beijing, China
Wenxiang Wang: University of Chinese Academy of Sciences, Beijing, China; Loongson Technology, Beijing, China
Fuxin Zhang: SKLP, Institute of Computing Technology, CAS, Beijing, China

Journal volume & issue: Vol. 4, no. 3
p. 100189

Abstract

Read online

With the core numbers integrated within single processors growing and the fast development of cloud computing, performance evaluation for multi-core systems is increasingly crucial. It is typically conducted by executing multi-task workloads, exemplified by SPEC CPU Rate, to measure metrics like system’s throughput. In response, several sampling-based methods have been developed for their pre-silicon performance evaluation. Nevertheless, these methods involve directly capturing multi-task checkpoints, which presents scalability issues of significant storage and time overheads. Therefore, enabling more scalable performance evaluation remains a critical problem.In this work, we propose MultiPoint to enable scalable pre-silicon performance evaluation for multi-task workloads. It is noted that in the multi-task workloads of interest, each task executes independently without inter-task communication. Therefore, MultiPoint is motivated to construct the required multi-task checkpoints by recovering multiple single-task checkpoints across different cores and guarantee their smooth execution through address remapping and shuffling. We implemented MultiPoint on the Emulator Accelerator and assessed its evaluation accuracy against its post-silicon Loongson 3A6000 processor. Using SPEC CPU 2017 as the benchmark, MultiPoint achieved the estimation errors of 6.20%, 5.45%, and 6.99% for Rate 2, Rate 4, and Rate 8, respectively, achieving comparable accuracy compared to direct multi-task checkpointing but in a more scalable manner with substantially 86.0% lower storage and 93.7% less time overheads.

Published in BenchCouncil Transactions on Benchmarks, Standards and Evaluations

ISSN: 2772-4859 (Online)
Publisher: KeAi Communications Co. Ltd.
Country of publisher: China
LCC subjects: Science; Technology: Engineering (General). Civil engineering (General)
Website: https://www.keaipublishing.com/en/journals/benchcouncil-transactions-on-benchmarks-standards-and-evaluations/

About the journal

Abstract

Keywords