Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

Evgeny Ponomarev; Sergey Matveev; Ivan Oseledets; Valery Glukhov

doi:10.3390/computers10080104

Computers (Aug 2021)

Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

Evgeny Ponomarev,
Sergey Matveev,
Ivan Oseledets,
Valery Glukhov

Affiliations

Evgeny Ponomarev: Skolkovo Institute of Science and Technology, 143026 Moscow, Russia
Sergey Matveev: Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, 119991 Moscow, Russia
Ivan Oseledets: Skolkovo Institute of Science and Technology, 143026 Moscow, Russia
Valery Glukhov: Noah’s Ark Lab., Huawei Technologies, 121614 Moscow, Russia

DOI: https://doi.org/10.3390/computers10080104
Journal volume & issue: Vol. 10, no. 8
p. 104

Abstract

Read online

A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may not be the best choice. In order to obtain a better approximation of latency, the research community uses lookup tables of all possible layers for the calculation of the inference on a mobile CPU. It requires only a small number of experiments. Unfortunately, on a mobile GPU, this method is not applicable in a straightforward way and shows low precision. In this work, we consider latency approximation on a mobile GPU as a data- and hardware-specific problem. Our main goal is to construct a convenient Latency Estimation Tool for Investigation (LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we make tools that provide a convenient way to conduct massive experiments on different target devices focusing on a mobile GPU. After evaluation of the dataset, one can train the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of the popular NAS-Benchmark 101 dataset for two different mobile GPU.

Published in Computers

ISSN: 2073-431X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/computers

About the journal

Abstract

Keywords