Journal of Information Systems and Informatics (Sep 2024)
Leveraging NLP to Analyze Regulatory Document Interconnections: A Systematic Review
Abstract
A sustainable digital village requires an effective policy management mechanism to deliver relevant regulatory information to the community. Management information systems for regulations play a crucial role in achieving this. However, communities still face challenges in understanding and navigating the relationships between various regulations. To address this issue, this study conducts a systematic review of the components found in regulatory documents and the methods used to analyze them. The review identifies eight key components in regulatory documents: topic, structure, category, initiator, level, considerations, related regulations, and content. Natural Language Processing (NLP) techniques can be employed for data preprocessing, including tokenization, lowercasing, stop word removal, stemming, filtering, part-of-speech tagging, lemmatization, and chunking. For feature extraction, methods such as TF-IDF, bag-of-words, WordCount, N-grams, and word embeddings can be applied. To measure the interconnection between regulations, techniques like cosine similarity and K-Means clustering can be utilized. Experimental results demonstrate that combining different methods significantly influences the accuracy of identifying regulatory interconnections. The choice of methods whether simple or complex depends on the context, and confirmation through manual analysis is often required to ensure accuracy.
Keywords