Evidence-Based Regularization for Neural Networks

Giuseppe Nuti; Andreea-Ingrid Cross; Philipp Rindler

doi:10.3390/make4040051

Machine Learning and Knowledge Extraction (Nov 2022)

Evidence-Based Regularization for Neural Networks

Giuseppe Nuti,
Andreea-Ingrid Cross,
Philipp Rindler

Affiliations

Giuseppe Nuti: UBS Investment Bank, 1285 Avenue of the Americas, New York, NY 10019, USA
Andreea-Ingrid Cross: UBS Investment Bank, 5 Broadgate, London EC2M 2QS, UK
Philipp Rindler: UBS Business Solutions AG CH, Europaallee 21, 8004 Zurich, Switzerland

DOI: https://doi.org/10.3390/make4040051
Journal volume & issue: Vol. 4, no. 4
pp. 1011 – 1023

Abstract

Read online

Numerous approaches address over-fitting in neural networks: by imposing a penalty on the parameters of the network (L1, L2, etc.); by changing the network stochastically (drop-out, Gaussian noise, etc.); or by transforming the input data (batch normalization, etc.). In contrast, we aim to ensure that a minimum amount of supporting evidence is present when fitting the model parameters to the training data. This, at the single neuron level, is equivalent to ensuring that both sides of the separating hyperplane (for a standard artificial neuron) have a minimum number of data points, noting that these points need not belong to the same class for the inner layers. We firstly benchmark the results of this approach on the standard Fashion-MINST dataset, comparing it to various regularization techniques. Interestingly, we note that by nudging each neuron to divide, at least in part, its input data, the resulting networks make use of each neuron, avoiding a hyperplane completely on one side of its input data (which is equivalent to a constant into the next layers). To illustrate this point, we study the prevalence of saturated nodes throughout training, showing that neurons are activated more frequently and earlier in training when using this regularization approach. A direct consequence of the improved neuron activation is that deep networks are now easier to train. This is crucially important when the network topology is not known a priori and fitting often remains stuck in a suboptimal local minima. We demonstrate this property by training a network of increasing depth (and constant width); most regularization approaches will result in increasingly frequent training failures (over different random seeds), whilst the proposed evidence-based regularization significantly outperforms in its ability to train deep networks.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords