Preserving Privacy in Arabic Judgments: AI-Powered Anonymization for Enhanced Legal Data Privacy

Taoufiq El Moussaoui; Loqman Chakir; Jaouad Boumhidi

doi:10.1109/ACCESS.2023.3324288

IEEE Access (Jan 2023)

Preserving Privacy in Arabic Judgments: AI-Powered Anonymization for Enhanced Legal Data Privacy

Taoufiq El Moussaoui,
Loqman Chakir,
Jaouad Boumhidi

Affiliations

Taoufiq El Moussaoui: ORCiD; LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
Loqman Chakir: ORCiD; LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco
Jaouad Boumhidi: LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, Morocco

DOI: https://doi.org/10.1109/ACCESS.2023.3324288
Journal volume & issue: Vol. 11
pp. 117851 – 117864

Abstract

Read online

Jurisprudence involves studying, interpreting, and applying the law to comprehend its societal impact. Judges annually review cases to ensure accurate law application, which raises privacy concerns when accessing files from other courts. While the legal field has garnered interest from the research community, the challenge of masking personal data, particularly in the Arabic language with limited resources, remains in its early stages. To address this research gap, we develop a two-component system for generating anonymous Arabic judgments. The first component, a personal data extractor model, utilizes Named Entity Recognition (NER) to identify key individual entities like names, addresses, birthdays, case numbers, and national identity codes. We train this model on a purpose-built Arabic legal corpus. The second component involves a Python module designed to mask the personal entities extracted by the first component. Together, these components enable the generation of anonymous judgments. Our model achieves an F1-score of 96.14% when detecting entities in the created Arabic Legal corpus. Additionally, experiments on the ANERCorp corpus, with training and testing splits of 70%-30% and 90%-10%, yield F1-scores of 93.78% and 95.77%, respectively. With these results, our proposed system demonstrates the promising potential for generating anonymous Arabic judgments. Furthermore, the built Arabic legal corpus provides a valuable resource for researchers aiming to enhance domain-specific NER models in Arabic text.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords