IEEE Access (Jan 2023)
RoBERTa-CoA: RoBERTa-Based Effective Finetuning Method Using Co-Attention
Abstract
In the field of natural language processing, artificial intelligence (AI) technology has been utilized to solve various problems, such as text classification, similarity measurement, chatbots, machine translation, and machine reading comprehension. Significant advancements have been made in complex and rule-intensive natural language processing through deep learning, in which machines directly learn patterns. Machine reading comprehension, a natural language processing task, involves machines understanding questions and paragraphs to find answers within paragraphs. In 2019, bidirectional encoder representations from transformers (BERT) and a robust optimized BERT pretraining approach (RoBERTa) were introduced. They were then optimized for pretraining and fine-tuning, resulting in significant advancements. RoBERTa outperformed BERT in terms of training speed and performance by increasing the pretraining data and batch sizes, employing dynamic masking, and eliminating the next sentence prediction task. In RoBERTa, machine reading comprehension involves simultaneously inputting questions and paragraphs. However, this simultaneous input method suffers from the attention separate representation (ASP) problem, in which the attention distribution between the question and the paragraph spreads widely across keywords. This study proposed two methods to address the ASP problem. The existing input format, question-paragraph, was changed to three independent inputs: question, paragraph, and concatenated RoBERTa outputs. The concatenated matrix was then transformed into two matrices, and a machine reading comprehension algorithm using co-attention was proposed. An ablation study was conducted to evaluate and analyze the model’s performance, comprehension, and design efficiency. According to the experimental results, the proposed method improved the EM by 0.9% and F1 by 1.0% compared to the existing methods. Consequently, the learning performance was enhanced through attention concentration and co-attention, and it could demonstrate much better performance compared to the existing models.
Keywords