Applied Sciences (Jan 2022)

Building a Production-Ready Multi-Label Classifier for Legal Documents with Digital-Twin-Distiller

  • Gergely Márk Csányi,
  • Renátó Vági,
  • Dániel Nagy,
  • István Üveges,
  • János Pál Vadász,
  • Andrea Megyeri,
  • Tamás Orosz

DOI
https://doi.org/10.3390/app12031470
Journal volume & issue
Vol. 12, no. 3
p. 1470

Abstract

Read online

One of the most time-consuming parts of an attorney’s job is finding similar legal cases. Categorization of legal documents by their subject matter can significantly increase the discoverability of digitalized court decisions. This is a multi-label classification problem, where each relatively long text can fit into more than one legal category. The proposed paper shows a solution where this multi-label classification problem is decomposed into more than a hundred binary classification problems. Several approaches have been tested, including different machine-learning and text-augmentation techniques to produce a practically applicable model. The proposed models and the methodologies were encapsulated and deployed as a digital-twin into a production environment. The performance of the created machine learning-based application reaches and could also improve the human-experts performance on this monotonous and labor-intensive task. It could increase the e-discoverability of the documents by about 50%.

Keywords