Journal of Big Data (Nov 2019)

An adaptive and real-time based architecture for financial data integration

  • Noussair Fikri,
  • Mohamed Rida,
  • Noureddine Abghour,
  • Khalid Moussaid,
  • Amina El Omri

DOI
https://doi.org/10.1186/s40537-019-0260-x
Journal volume & issue
Vol. 6, no. 1
pp. 1 – 25

Abstract

Read online

Abstract In this paper we are proposing an adaptive and real-time approach to resolve real-time financial data integration latency problems and semantic heterogeneity. Due to constraints that we have faced in some projects that requires real-time massive financial data integration and analysis, we decided to follow a new approach by combining a hybrid financial ontology, resilient distributed datasets and real-time discretized stream. We create a real-time data integration pipeline to avoid all problems of classic Extract-Transform-Load tools, which are data processing latency, functional miscomprehensions and metadata heterogeneity. This approach is considered as contribution to enhance reporting quality and availability in short time frames, the reason of the use of Apache Spark. We studied Extract-Transform-Load (ETL) concepts, data warehousing fundamentals, big data processing technics and oriented containers clustering architecture, in order to replace the classic data integration and analysis process by our new concept resilient distributed DataStream for online analytical process (RDD4OLAP) cubes which are consumed by using Spark SQL or Spark Core basics.

Keywords