Distributed Synthetic Minority Oversampling Technique

Sakshi Hooda; Suman Mann

doi:10.2991/ijcis.d.190719.001

International Journal of Computational Intelligence Systems ()

Distributed Synthetic Minority Oversampling Technique

Sakshi Hooda,
Suman Mann

Affiliations

Sakshi Hooda
Suman Mann

DOI: https://doi.org/10.2991/ijcis.d.190719.001

Abstract

Read online

Real world problems for prediction usually try to predict rare occurrences. Application of standard classification algorithm is biased toward against these rare events, due to this data imbalance. Typical approaches to solve this data imbalance involve oversampling these “rare events” or under sampling the majority occurring events. Synthetic Minority Oversampling Technique is one technique that addresses this class imbalance effectively. However, the existing implementations of SMOTE fail when data grows and can't be stored on a single machine. In this paper present our solution to address the “big data challenge.” We provide a distributed version of SMOTE by using scalable k-means++ and M-Trees. With this implementation of SMOTE, we were able to oversample the “rare events” and achieve results which are better than the existing python version of SMOTE.

Published in International Journal of Computational Intelligence Systems

ISSN: 1875-6891 (Print); 1875-6883 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44196

About the journal

Abstract

Keywords