Evaluating the performance of multilingual models in answer extraction and question generation

Antonio Moreno-Cediel; Jesus-Angel del-Hoyo-Gabaldon; Eva Garcia-Lopez; Antonio Garcia-Cabot; David de-Fitero-Dominguez

doi:10.1038/s41598-024-66472-5

Scientific Reports (Jul 2024)

Evaluating the performance of multilingual models in answer extraction and question generation

Antonio Moreno-Cediel,
Jesus-Angel del-Hoyo-Gabaldon,
Eva Garcia-Lopez,
Antonio Garcia-Cabot,
David de-Fitero-Dominguez

Affiliations

Antonio Moreno-Cediel: Departamento de Ciencias de la Computación, Universidad de Alcalá
Jesus-Angel del-Hoyo-Gabaldon: Departamento de Ciencias de la Computación, Universidad de Alcalá
Eva Garcia-Lopez: Departamento de Ciencias de la Computación, Universidad de Alcalá
Antonio Garcia-Cabot: Departamento de Ciencias de la Computación, Universidad de Alcalá
David de-Fitero-Dominguez: Departamento de Ciencias de la Computación, Universidad de Alcalá

DOI: https://doi.org/10.1038/s41598-024-66472-5
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Multiple-choice test generation is one of the most complex NLP problems, especially in languages other than English, where there is a lack of prior research. After a review of the literature, it has been verified that some methods like the usage of rule-based systems or primitive neural networks have led to the application of a recent architecture, the Transformer architecture, in the tasks of Answer Extraction (AE) and Question Generation (QG). Thereby, this study is centred in searching and developing better models for the AE and QG tasks in Spanish, using an answer-aware methodology. For this purpose, three multilingual models (mT5-base, mT0-base and BLOOMZ-560 M) have been fine-tuned using three different datasets: a translation to Spanish of the SQuAD dataset; SQAC, which is a dataset in Spanish; and their union (SQuAD + SQAC), which shows slightly better results. Regarding the models, the performance of mT5-base has been compared with that found in two newer models, mT0-base and BLOOMZ-560 M. These models were fine-tuned for multiple tasks in literature, including AE and QG, but, in general, the best results are obtained from the mT5 models trained in our study with the SQuAD + SQAC dataset. Nonetheless, some other good results are obtained from mT5 models trained only with the SQAC dataset. For their evaluation, the widely used BLEU1-4, METEOR and ROUGE-L metrics have been obtained, where mT5 outperforms some similar research works. Besides, CIDEr, SARI, GLEU, WER and the cosine similarity metrics have been calculated to present a benchmark within the AE and QG problems for future work.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal