An Imbalanced Big Data Mining Framework for Improving Optimization Algorithms Performance

Eslam Mohsen Hassib; Ali Ibrahim El-Desouky; El-Sayed M. El-Kenawy; Sally M. El-Ghamrawy

doi:10.1109/ACCESS.2019.2955983

IEEE Access (Jan 2019)

An Imbalanced Big Data Mining Framework for Improving Optimization Algorithms Performance

Eslam Mohsen Hassib,
Ali Ibrahim El-Desouky,
El-Sayed M. El-Kenawy,
Sally M. El-Ghamrawy

Affiliations

Eslam Mohsen Hassib: ORCiD; Computer Engineering and Systems Department, Faculty of Engineering, Mansoura University, Mansoura, Egypt
Ali Ibrahim El-Desouky: ORCiD; Computer Engineering and Systems Department, Faculty of Engineering, Mansoura University, Mansoura, Egypt
El-Sayed M. El-Kenawy: ORCiD; Department of Computer and Systems Engineering, Delta Higher Institute for Engineering and Technology (DHIET), Mansoura, Egypt
Sally M. El-Ghamrawy: ORCiD; Head of Communications and Computer Engineering Department, MISR Higher Institute for Engineering and Technology, Mansoura, Egypt

DOI: https://doi.org/10.1109/ACCESS.2019.2955983
Journal volume & issue: Vol. 7
pp. 170774 – 170795

Abstract

Read online

Big data is an important factor almost in all nowadays technologies, such as, social media, smart cities, and internet of things. Most of standard classifiers tends to be trapped in local optima problem when dealing with such massive datasets. Hence, investigating new techniques for dealing with such massive data sets is required. This paper presents a novel imbalanced big data mining framework for improving optimization algorithms performance by eliminating the local optima problem consists of three main stages. Firstly, the preprocessing stage, which uses the LSH-SMOTE algorithm for solving the class imbalance problem, then it uses the LSH algorithm for hashing the data set instances into buckets. Secondly, the bucket search stage, which uses the GWO for training bidirectional recurrent neural network BRNN and searching for the global optimum in each bucket. Lastly, the final tournament winner stage, which uses the GWO+BRNN for finding the global optimum of the whole data set among all global optimums from all buckets. Our proposed framework LSHGWOBRNN has been tested over 9 data sets one of them is big data set in terms of AUC, MSE, against seven well-known machine-learning algorithms (Naive Bayes, Random Tree, Decision Table, and AdaBoostM1, WOA+MLP, GWO+MLP, and WOA+BRNN), then, we tested our algorithm over four well-known data sets against GWO+MLP, ACO+MLP, GA+MLP, PSO+MLP, PBIL+MLP, and ES+MLP in terms of classification accuracy and MSE. Our experimental results have proved that our proposed framework LSHGWOBRNN has provided high local optima avoidance, and higher accuracy, less complexity and overhead.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords