AI Open (Jan 2022)

CAILIE 1.0: A dataset for Challenge of AI in Law - Information Extraction V1.0

  • Yu Cao,
  • Yuanyuan Sun,
  • Ce Xu,
  • Chunnan Li,
  • Jinming Du,
  • Hongfei Lin

Journal volume & issue
Vol. 3
pp. 208 – 212

Abstract

Read online

Legal information extraction requires identifying and classifying legal elements from specific legal documents. Considering that information extraction is mainly regarded as the first step in natural language understanding, the quality of legal information extraction results certainly has an immense impact on the performance of various legal artificial intelligence (AI) downstream tasks. However, Chinese judicial information extraction datasets are very scarce due to the particularity of legal documents. In response to this situation, we constructed a dataset for Challenge of AI in Law - Information Extraction V1.0 (CAILIE 1.0). The following two features of CAILIE are worth highlighting: 1) the entity definition focuses on more fine-grained theft document information, providing more interpretability for downstream legal AI; and 2) we define entity labels with judicial attributes based on natural attribute labels to meet the needs of Chinese judicial practice. We implement some classic models on this dataset. The experimental results show that legal information extraction is still challenging and additional research is required for this task to be solved.

Keywords