Bayesian Sparsification for Deep Neural Networks With Bayesian Model Reduction

Dimitrije Markovic; Karl J. Friston; Stefan J. Kiebel

doi:10.1109/ACCESS.2024.3417219

IEEE Access (Jan 2024)

Bayesian Sparsification for Deep Neural Networks With Bayesian Model Reduction

Dimitrije Markovic,
Karl J. Friston,
Stefan J. Kiebel

Affiliations

Dimitrije Markovic: ORCiD; Chair of Cognitive Computational Neuroscience, Technische Universität Dresden, Dresden, Germany
Karl J. Friston: ORCiD; VERSES AI Research Lab, Los Angeles, CA, USA
Stefan J. Kiebel: Chair of Cognitive Computational Neuroscience, Technische Universität Dresden, Dresden, Germany

DOI: https://doi.org/10.1109/ACCESS.2024.3417219
Journal volume & issue: Vol. 12
pp. 88231 – 88242

Abstract

Read online

Deep learning’s immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art – in Bayesian sparsification of deep neural networks – combines structural shrinkage priors on model weights with an approximate inference scheme based on stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the advantages of the BMR method relative to established approaches, which are based on hierarchical horseshoe priors over model weights. We illustrate the potential of BMR across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords