Legal Document Similarity Matching Based on Ensemble Learning

Aman Fan; Shaoxi Wang; Yanchuan Wang

doi:10.1109/ACCESS.2024.3371262

IEEE Access (Jan 2024)

Legal Document Similarity Matching Based on Ensemble Learning

Aman Fan,
Shaoxi Wang,
Yanchuan Wang

Affiliations

Aman Fan: ORCiD; School of Higher Vocational Education, Shaanxi Institute of International Trade & Commerce, Xi’an, China
Shaoxi Wang: School of Microelectronics, Northwestern Polytechnical University, Xi’an, China
Yanchuan Wang: School of Marxism, Northwestern Polytechnical University, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2024.3371262
Journal volume & issue: Vol. 12
pp. 33910 – 33922

Abstract

Read online

The application of artificial intelligence in the legal domain has received significant attention from legal professionals and AI researchers in recent years. The intelligent judge system has made remarkable progress due to advancements in natural language processing, particularly deep learning. Matching similar cases has enormous potential with significant implications for the legal domain. Matching and analyzing similar cases helps legal professionals make more reasonable judgments, ensuring fairness, consistency, and accuracy in law applications. The existing methods did not fully use representation-based and interaction-based text matching in the feature extraction. This paper presents an innovative approach that employs ensemble learning with multiple models to enhance the prediction of legal case similarity. The method comprises two sub-networks: a similarity representation sub-network and a binary classification judgment sub-network. The similarity representation sub-network is trained using contrastive learning, focusing on semanticizing the similarity between sample features to distinguish between dissimilar samples and reduce the distance between similar ones. Furthermore, the binary classification judgment sub-network integrates sample pairs to facilitate feature interaction between text pairs during extraction. The training of these two sub-networks employs different information processing and optimization loss, which allows ensemble learning to capitalize on the strengths of both models and significantly improve the accuracy of predicting the similarity of legal cases. The accuracy of our method on the test set is 74.53%, outperforming other existing methods on the public dataset CAIL2019-SCM.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords