IEEE Access (Jan 2024)
Utilizing Machine Learning Techniques for Worst-Case Execution Time Estimation on GPU Architectures
Abstract
The massive parallelism provided by Graphics Processing Units (GPUs) to accelerate compute-intensive tasks makes it preferable for Real-Time Systems such as autonomous vehicles. Such systems require the execution of heavy Machine Learning (ML) and Computer Vision applications because of the computing power of GPUs. However, such systems need a guarantee of timing predictability. It means the Worst-Case Execution Time (WCET) of the application is estimated tightly and safely to schedule each application before its deadline to avoid catastrophic consequences. As more applications use GPUs, running many applications simultaneously on the same GPU becomes necessary. To provide predictable performance while the application is running in parallel, it must be WCET-aware, which GPUs do not fully support in a multitasking environment. Nvidia recently added a feature called the Multi-Process Service. It allows the different applications to run simultaneously in the same CUDA context by partitioning the compute resources of the GPU. Using this feature, we can measure the interference from co-running GPU applications to estimate WCET. In this paper, we propose a novel technique to estimate the WCET of the GPU kernel using an ML approach. Our approach is based on the application’s source, and the model is trained based on the large data set. The approach is flexible and can be applied to different GPU-sharing mechanisms. We allow the victim and enemy kernel of the GPU to execute in parallel to get the maximum interference from the enemy to estimate the WCET of the victim kernel. Enemy kernels are chosen to cause a higher slowdown by acquiring the resources of the victim kernel. We compare our implementation with state-of-the-art approaches to show its effectiveness. Our ML approach reduces the time by 99% in most cases because inferences take only seconds to predict WCET, and the resource consumption required to estimate WCET compared to traditional approaches is minimal because we don’t need to execute the application on GPU for hours. Although our approach does not offer safety guarantees because of its empirical nature, we observed that predicted WCETs are always higher than any observed execution times for all benchmarks, and the maximum overestimation factor observed is 11x.
Keywords