IEEE Access (Jan 2022)

Demand MemCpy: Overlapping of Computation and Data Transfer for Heterogeneous Computing

  • Donghun Jeong,
  • Jihun Park,
  • Jungrae Kim

DOI
https://doi.org/10.1109/ACCESS.2022.3195271
Journal volume & issue
Vol. 10
pp. 79925 – 79938

Abstract

Read online

Heterogeneous computing relies on collaboration among different types of processors on shared data. In systems with discrete accelerators (e.g., GP-GPU), data sharing requires transferring a large amount of data between CPU and accelerator memories and can significantly increase the end-to-end execution time. This paper proposes a novel mechanism called Demand MemCpy (DMC) to hide the data sharing overheads. DMC copies data from host memory to accelerator memory based on demands at page granularity. It utilizes a hardware-only mechanism to fetch the requested page with a short latency and the background pre-copy to fetch related pages in advance. Our evaluation shows that DMC can reduce the end-to-end execution time of GP-GPU application by 25.4% on average by overlapping computation with data transfer and not transferring unused pages.

Keywords