IEEE Access (Jan 2024)

BERT-NAR-BERT: A Non-Autoregressive Pre-Trained Sequence-to-Sequence Model Leveraging BERT Checkpoints

  • Mohammad Golam Sohrab,
  • Masaki Asada,
  • Matiss Rikters,
  • Makoto Miwa

DOI
https://doi.org/10.1109/ACCESS.2023.3346952
Journal volume & issue
Vol. 12
pp. 23 – 33

Abstract

Read online

We introduce BERT-NAR-BERT (BnB) – a pre-trained non-autoregressive sequence-to-sequence model, which employs BERT as the backbone for the encoder and decoder for natural language understanding and generation tasks. During the pre-training and fine-tuning with BERT-NAR-BERT, two challenging aspects are considered by adopting the length classification and connectionist temporal classification models to control the output length of BnB. We evaluate it using a standard natural language understanding benchmark GLUE and three generation tasks – abstractive summarization, question generation, and machine translation. Our results show substantial improvements in inference speed (on average 10x faster) with only little deficiency in output quality when compared to our direct autoregressive baseline BERT2BERT model. Our code is publicly released on GitHub (https://github.com/aistairc/BERT-NAR-BERT) under the Apache 2.0 License.

Keywords