JOIV: International Journal on Informatics Visualization (Aug 2022)

Verification of a Dataset for Korean Machine Reading Comprehension with Numerical Discrete Reasoning over Paragraphs

  • Gyeongmin Kim,
  • Jaechoon Jo

DOI
https://doi.org/10.30630/joiv.6.2-2.1120
Journal volume & issue
Vol. 6, no. 2-2
pp. 587 – 592

Abstract

Read online

Numerical reasoning in machine reading comprehension (MRC) has demonstrated significant performance improvements in the past few years. However, due to the process being restricted to specific languages, low-resource languages are not considered, and MRC studies on such languages are limited. In addition, the methods that rely on existing information extracted within the span of a paragraph have limitations in responding to questions requiring actual reasoning. To overcome these shortcomings, this study establishes a dataset for learning Korean Question and Answering (QA) models that not only answer within the span of passages but also perform numerical reasoning on passages and questions. Its efficacy was verified by training the model. We recruited eight annotators to tag the ground truth label, and they annotated datasets with 920, 115, and 115 passages in the train, dev, and test, respectively. A simple yet sophisticated automatic inter-annotation tool was created by effectively reducing the possibility of inaccuracy and error entailed by humans in the data construction process. This tool used common KoBERT and KoELECTRA. We defined four general conditions, and six conditions humans must inspect and fine-tune the pre-trained language models with numerically aware architecture. The KoELECTRA and NumNet+ with KoELECTRA were fine-tuned, and experiments in identical hyperparameter settings showed that compared with other models, the performance of NumNet+ with KoELECTRA was higher by more than 1.3 points. Our research contributes to the Korean MRC research and suggests potential and insight into MRC models capable of numerical reasoning.

Keywords