AppliedMath (Oct 2022)
On the Universally Optimal Activation Function for a Class of Residual Neural Networks
Abstract
While non-linear activation functions play vital roles in artificial neural networks, it is generally unclear how the non-linearity can improve the quality of function approximations. In this paper, we present a theoretical framework to rigorously analyze the performance gain of using non-linear activation functions for a class of residual neural networks (ResNets). In particular, we show that when the input features for the ResNet are uniformly chosen and orthogonal to each other, using non-linear activation functions to generate the ResNet output averagely outperforms using linear activation functions, and the performance gain can be explicitly computed. Moreover, we show that when the activation functions are chosen as polynomials with the degree much less than the dimension of the input features, the optimal activation functions can be precisely expressed in the form of Hermite polynomials. This demonstrates the role of Hermite polynomials in function approximations of ResNets.
Keywords