DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension

Sun, Kai; Yu, Dian; Chen, Jianshu; Yu, Dong; Choi, Yejin; Cardie, Claire

doi:10.1162/tacl_a_00264

Transactions of the Association for Computational Linguistics (Nov 2019)

DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension

Sun, Kai,
Yu, Dian,
Chen, Jianshu,
Yu, Dong,
Choi, Yejin,
Cardie, Claire

Affiliations

Sun, Kai
Yu, Dian
Chen, Jianshu
Yu, Dong
Choi, Yejin
Cardie, Claire

DOI: https://doi.org/10.1162/tacl_a_00264
Journal volume & issue: Vol. 7
pp. 217 – 231

Abstract

Read online

We present DREAM, the first dialogue-based multiple-choice reading comprehension data set. Collected from English as a Foreign Language examinations designed by human experts to evaluate the comprehension level of Chinese learners of English, our data set contains 10,197 multiple-choice questions for 6,444 dialogues. In contrast to existing reading comprehension data sets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge. We apply several popular neural reading comprehension models that primarily exploit surface information within the text and find them to, at best, just barely outperform a rule-based approach. We next investigate the effects of incorporating dialogue structure and different kinds of general world knowledge into both rule-based and (neural and non-neural) machine learning-based reading comprehension models. Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge. DREAM is available at https://dataset.org/dream/ .

Published in Transactions of the Association for Computational Linguistics

ISSN: 2307-387X (Online)
Publisher: The MIT Press
Country of publisher: United States
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing
Website: https://direct.mit.edu/tacl

About the journal