Iranian Journal of Information Processing & Management (Feb 2025)
Performance Evaluation and Accuracy Improvement in Individual Record Linking Problems Using Decision Tree Algorithm in Machine Learning
Abstract
Record linkage is a vital process for consolidating data from different sources, particularly in Persian records where diverse data structures and formats present challenges. To tackle these complexities, an expert system with decision tree algorithms is crucial for ensuring precise record linkage and data aggregation. Adaptation operations are created based on predefined rules by incorporating decision trees into an expert system framework, simplifying the aggregation of disparate data sources. This method not only surpasses traditional approaches like IF-THEN rules in effectiveness and ease of use but also improves accessibility for non-technical users due to its intuitive nature. Integrating probabilistic record linkage results into the decision tree model within the expert system automates the linkage process, allowing users to customize string metrics and thresholds for optimal outcomes. The model's accuracy rate of over 95% on test data highlights its effectiveness in predicting and adjusting to data variations, confirming its reliability in various record linkage scenarios. The innovative utilization of machine learning decision trees alongside probabilistic record linkage in an expert system represents a significant advancement in the field, providing a robust solution for data aggregation in intricate environments and large-scale projects involving Persian records. Combining decision tree algorithms and probabilistic record linkage within an expert system offers a powerful tool for handling complex data integration tasks. This approach not only streamlines the process of consolidating diverse data sources but also enhances the accuracy and efficiency of record linkage operations By leveraging machine learning techniques and automated decision-making processes, organizations can achieve significant improvements in data quality and consistency, paving the way for more reliable and insightful analytical results in implementing statistical registers. In conclusion, integrating decision trees and probabilistic record linkage in an expert system represents a cutting-edge solution for addressing data aggregation challenges in Persian records and beyond.
Keywords