A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network

Muhammad  Ashfaq Khan; Md.  Rezaul Karim; Yangwoo Kim

doi:10.3390/sym10100485

Symmetry (Oct 2018)

A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network

Muhammad Ashfaq Khan,
Md. Rezaul Karim,
Yangwoo Kim

Affiliations

Muhammad Ashfaq Khan: Department of Information and Communication Engineering, Dongguk University, 30 Pildong-ro1-gil, Jung-gu, Seoul 100-715, Korea
Md. Rezaul Karim: Fraunhofer FIT, Birlinghoven, 53754 Sankt Augustin, Germany
Yangwoo Kim: Department of Information and Communication Engineering, Dongguk University, 30 Pildong-ro1-gil, Jung-gu, Seoul 100-715, Korea

DOI: https://doi.org/10.3390/sym10100485
Journal volume & issue: Vol. 10, no. 10
p. 485

Abstract

Read online

Every day we experience unprecedented data growth from numerous sources, which contribute to big data in terms of volume, velocity, and variability. These datasets again impose great challenges to analytics framework and computational resources, making the overall analysis difficult for extracting meaningful information in a timely manner. Thus, to harness these kinds of challenges, developing an efficient big data analytics framework is an important research topic. Consequently, to address these challenges by exploiting non-linear relationships from very large and high-dimensional datasets, machine learning (ML) and deep learning (DL) algorithms are being used in analytics frameworks. Apache Spark has been in use as the fastest big data processing arsenal, which helps to solve iterative ML tasks, using distributed ML library called Spark MLlib. Considering real-world research problems, DL architectures such as Long Short-Term Memory (LSTM) is an effective approach to overcoming practical issues such as reduced accuracy, long-term sequence dependency, and vanishing and exploding gradient in conventional deep architectures. In this paper, we propose an efficient analytics framework, which is technically a progressive machine learning technique merged with Spark-based linear models, Multilayer Perceptron (MLP) and LSTM, using a two-stage cascade structure in order to enhance the predictive accuracy. Our proposed architecture enables us to organize big data analytics in a scalable and efficient way. To show the effectiveness of our framework, we applied the cascading structure to two different real-life datasets to solve a multiclass and a binary classification problem, respectively. Experimental results show that our analytical framework outperforms state-of-the-art approaches with a high-level of classification accuracy.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords