Applied Sciences (Jul 2023)

Amortized Bayesian Meta-Learning with Accelerated Gradient Descent Steps

  • Zhewei Zhang,
  • Xuejing Li,
  • Shengjin Wang

DOI
https://doi.org/10.3390/app13158653
Journal volume & issue
Vol. 13, no. 15
p. 8653

Abstract

Read online

Recent meta-learning models often learn priors from observed tasks using a network optimized via stochastic gradient descent (SGD), which usually takes more training steps to convergence. In this paper, we propose an accelerated Bayesian meta-learning structure with a stochastic inference network (ABML-SIN). The proposed model aims to solve the training procedure of Bayesian meta-learning to improve the training speed and efficiency. Current approaches of meta-learning hardly converge within a few descent steps, owing to the small number of training samples. Therefore, we introduce an accelerated gradient descent learning network based on teacher–student architecture to learn the meta-latent variable θt for task t. With this amortized fast inference network, the meta-learner is able to learn the task-specific latent θt within a few training steps; thus, it improves the learning speed of the meta-learner. To refine the latent variables generated from the transductive amortization network of the meta-learner, SIN—followed by a conventional SGD-optimized network—is introduced as the student–teacher network to online-update the parameters. SIN extracts the local latent variables and accelerates the convergence of the meta-learning network. Our experiments on simulation data demonstrate that the proposed method provides generalization and scalability on unseen samples, and produces competitive/superior uncertainty estimations on few-shot learning tasks on two widely adopted 2D datasets with fewer training epochs compared to the state-of-the-art meta-learning approaches. Furthermore, the parameters generated by SIN act as perturbations on latent weights, enhancing the probability of accelerating the training efficiency of the meta-learner. Extensive qualitative experiments show that our method performs well across different meta-learning tasks in both simulated and real-world circumstances.

Keywords