IEEE Access (Jan 2024)

LNLF-BERT: Transformer for Long Document Classification With Multiple Attention Levels

  • Linh Manh Pham,
  • Hoang Cao the

DOI
https://doi.org/10.1109/ACCESS.2024.3492102
Journal volume & issue
Vol. 12
pp. 165348 – 165358

Abstract

Read online

Transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), cannot process long sequences because their self-attention operation scales quadratically with the sequence length. To remedy this, we introduce the Look Near and Look Far BERT (LNLF-BERT) with a two-level self-attention mechanism at the sentence and document levels, which can handle document classifications with thousands of tokens. The self-attention mechanism of LNLF-BERT retains some of the benefits of full self-attention at each level while reducing the complexity of not using full self-attention on the whole document. Our theoretical analysis shows that the LNLF-BERT mechanism is an approximator of the full self-attention model. We pretrain the LNLF-BERT from scratch and fine-tune it on downstream tasks. The experiments were also conducted to demonstrate the feasibility of LNLF-BERT in long text processing. Moreover, LNLF-BERT effectively balances local and global attention, allowing for efficient document-level understanding. Compared to other long-sequence models like Longformer and BigBird, LNLF-BERT shows competitive performance in both accuracy and computational efficiency. The architecture is scalable to various downstream tasks, making it adaptable for different applications in natural language processing.

Keywords