Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP

Simone Francescato; Stefano Giagu; Federica Riti; Graziella Russo; Luigi Sabetta; Federico Tortonesi

doi:10.1140/epjc/s10052-021-09770-w

European Physical Journal C: Particles and Fields (Nov 2021)

Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP

Simone Francescato,
Stefano Giagu,
Federica Riti,
Graziella Russo,
Luigi Sabetta,
Federico Tortonesi

Affiliations

Simone Francescato: Department of Physics, Harvard University
Stefano Giagu: Department of Physics, Sapienza University and INFN Sezione di Roma
Federica Riti: Department of Physics, ETH Zürich
Graziella Russo: Department of Physics, Sapienza University and INFN Sezione di Roma
Luigi Sabetta: Department of Physics, Sapienza University and INFN Sezione di Roma
Federico Tortonesi: Department of Physics, Sapienza University and INFN Sezione di Roma

DOI: https://doi.org/10.1140/epjc/s10052-021-09770-w
Journal volume & issue: Vol. 81, no. 11
pp. 1 – 10

Abstract

Read online

Abstract Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks (DNNs) and convolutional neural networks (CNNs) on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are in development for the upgraded LHC detectors, we have developed a multi-stage compression approach based on conventional compression strategies (pruning and quantization) to reduce the memory footprint of the model and knowledge transfer techniques, crucial to streamline the DNNs simplifying the synthesis phase in the FPGA firmware and improving explainability. We present the developed methodologies and the results of the implementation in a working engineering pipeline used as pre-processing stage to high level synthesis tools (HLS4ML, Xilinx Vivado HLS, etc.). We show how it is possible to build ultra-light deep neural networks in practice, by applying the method to a realistic HEP use-case: a toy simulation of one of the triggers planned for the HL-LHC.

Published in European Physical Journal C: Particles and Fields

ISSN: 1434-6044 (Print); 1434-6052 (Online)
Publisher: SpringerOpen
Country of publisher: Germany
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics: Nuclear and particle physics. Atomic energy. Radioactivity
Website: http://www.springer.com/physics/particle+and+nuclear+physics/journal/10052

About the journal