Soft Error Resilience of Deep Residual Networks for Object Recognition

Younis Ibrahim; Haibin Wang; Man Bai; Zhi Liu; Jianan Wang; Zhiming Yang; Zhengming Chen

doi:10.1109/ACCESS.2020.2968129

IEEE Access (Jan 2020)

Soft Error Resilience of Deep Residual Networks for Object Recognition

Younis Ibrahim,
Haibin Wang,
Man Bai,
Zhi Liu,
Jianan Wang,
Zhiming Yang,
Zhengming Chen

Affiliations

Younis Ibrahim: College of IoT Engineering, Hohai University–Changzhou, Changzhou, China
Haibin Wang: ORCiD; College of IoT Engineering, Hohai University–Changzhou, Changzhou, China
Man Bai: College of IoT Engineering, Hohai University–Changzhou, Changzhou, China
Zhi Liu: College of IoT Engineering, Hohai University–Changzhou, Changzhou, China
Jianan Wang: National Key Laboratory of Analog Integrated Circuits, Chongqing, China
Zhiming Yang: Harbin Institute of Technology, Harbin, China
Zhengming Chen: College of IoT Engineering, Hohai University–Changzhou, Changzhou, China

DOI: https://doi.org/10.1109/ACCESS.2020.2968129
Journal volume & issue: Vol. 8
pp. 19490 – 19503

Abstract

Read online

Convolutional Neural Networks (CNNs) have truly gained attention in object recognition and object classification in particular. When being implemented on Graphics Processing Units (GPUs), deeper networks are more accurate than shallow ones. Residual Networks (ResNets) are one of the deepest CNN architectures used in various fields including safety-critical ones. GPUs have proven to be the major accelerator for CNN models. However, modern GPUs are prone to radiation-induced soft errors, which is a serious issue in safety-compliant systems. In this work, we analyze and propose an approach to address the reliability of ResNet on GPUs. We firstly analyze three popular ResNet models, explicitly, ResNet-50, ResNet-101, and ResNet-152 through NVIDIA's fault injector, SASSIFI. We perform an in-depth analysis of the model from the perspective of layer and kernel vulnerability. Then, we experimentally show the vulnerability of ResNet models and identify the most vulnerable portions. Finally, we validate our solution, which is a selective-hardening technique, through hardening the worth-hardening kernels to avoid unnecessary overheads. Our strategy is demonstrated to mask up to 93.38% of the injected errors with performance overhead less than 5.35%. Furthermore, the percentage of the errors causing misclassifications can be reduced from 4.2% to 0.104%, thereby significantly improving the model's reliability.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords