Augmented and challenging datasets with multi-step reasoning and multi-span questions for Chinese judicial reading comprehension

Qingye Meng; Ziyue Wang; Hang Chen; Xianzhen Luo; Baoxin Wang; Zhipeng Chen; Yiming Cui; Dayong Wu; Zhigang Chen; Shijin Wang

AI Open (Jan 2022)

Augmented and challenging datasets with multi-step reasoning and multi-span questions for Chinese judicial reading comprehension

Qingye Meng,
Ziyue Wang,
Hang Chen,
Xianzhen Luo,
Baoxin Wang,
Zhipeng Chen,
Yiming Cui,
Dayong Wu,
Zhigang Chen,
Shijin Wang

Affiliations

Qingye Meng: State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, China; Corresponding author.
Ziyue Wang: State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, China
Hang Chen: State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, China
Xianzhen Luo: State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, China
Baoxin Wang: State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, China; Research Center for SCIR, Harbin Institute of Technology, Harbin, China
Zhipeng Chen: State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, China
Yiming Cui: State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, China; Research Center for SCIR, Harbin Institute of Technology, Harbin, China
Dayong Wu: State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, China
Zhigang Chen: State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, China
Shijin Wang: State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, China; iFLYTEK AI Research (Hebei), LangFang, China

Journal volume & issue: Vol. 3
pp. 193 – 199

Abstract

Read online

The existing judicial reading comprehension datasets are relatively simple, and the answers to the questions can be obtained through single-step reasoning. However, the content of legal documents in actual scenarios is complex, making it problematic to infer correct results merely by single-step reasoning. To solve this type of issue, we promote the difficulties of questions included in Chinese Judicial Reading Comprehension (CJRC) dataset and propose two augmented versions, CJRC2.0 and CJRC3.0. These datasets are derived from Chinese judicial judgment documents in different fields and annotated by judicial professionals. Compared to CJRC, there are more types of judgment documents in the two datasets, and the questions become are more challenging to answer. For CJRC2.0, we only preserve complex questions that require to be solved by multi-step reasoning. Besides, we provide additional supporting facts to the answers. For CJRC3.0, we introduce a new question type, the multi-span question, which should be answered by extracting and combining multiple spans in the documents. We implement two powerful baselines to evaluate the difficulty of our proposed datasets. Our proposed datasets fill gaps in the field of explainable legal machine reading comprehension.

Published in AI Open

ISSN: 2666-6510 (Online)
Publisher: KeAi Communications Co. Ltd.
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.keaipublishing.com/en/journals/ai-open/

About the journal

Abstract

Keywords