Complex & Intelligent Systems (Oct 2021)
Integration and classification approach based on probabilistic semantic association for big data
Abstract
Abstract The process of integration through classification provides a unified representation of diverse data sources in Big data. The main challenges of big data analysis are due to the various granularities, irreconcilable data models, and multipart interdependencies between data content. Previously designed models were facing problems in integrating and analyzing big data due to highly complex and dynamic multi-source and heterogeneous information variation and also in processing and classifying the association among the attributes in a schema. In this paper, we propose an integration and classification approach through designing a Probabilistic Semantic Association (PSA) method to generate the feature pattern for the sources of big data. The PSA approach is trained to understand the data association and dependency pattern between the data class and incoming data to map the data objects accurately. It initially builds a data integration mechanism by transforming data into structured and learn to utilize the trained knowledge to classify the probabilistic association among the data and knowledge patterns. Later it builds a data analysis mechanism to analyze the mapped data through PSA to evaluate the integration efficiency. An experimental evaluation is performed over a real-time crime dataset generated from multiple locations having various events classes. The analysis of results confined that the utilization of knowledge patterns of accurate classification to enhance the integration of multiple source data is appropriate. The measure of precision, recall, fall-out rate, and F-measure approve the efficiency of the proposed PSA method. Even in comparison with the state-of-art classification method and with SC-LDA algorithm shows an improvisation in the prediction accuracy and enhance the data integration.
Keywords