IEEE Access (Jan 2022)

AttractionDetailsQA: An Attraction Details Focused on Chinese Question Answering Dataset

  • Weiming Huang,
  • Shiting Xu,
  • Wang Yuhan,
  • Jin Fan,
  • Qingling Chang

DOI
https://doi.org/10.1109/ACCESS.2022.3181188
Journal volume & issue
Vol. 10
pp. 86215 – 86221

Abstract

Read online

With the increase in the number of domestic tourists and the popularity of digital upgrades in attractions, it is crucial to develop a question-answering(QA) system about the details of the attractions. However, there is little work on attractions QA, and the main bottleneck is the lack of available datasets. While previous QA datasets usually focus on news domain like CNN/DAILYMAIL and NewsQA, we present the first large-scale dataset for QA over attraction details. To ensure that the data we collected are useful, we only gather the data from public travel information website. Unlike other QA datasets like SQuAD, which is labeled manually, we formed the dataset by manual and question-answer pair generation(QAG) annotated model. Finally, we obtained a dataset covering 2,808 attractions with a total of 18,245 QA pairs, including seven types of attraction details: location, time, component, area, layout, rating, and character. The dataset is available at https://github.com/wyman130/AttractionDetailsQA. Considering that QAG has not been much studied in attraction details, we experimented some QAG models on this dataset and obtained the benchmark. This provides a basis for subsequent improvements to the dataset and research on QAG in attraction details.

Keywords