IEEE Access (Jan 2020)

EMSGD: An Improved Learning Algorithm of Neural Networks With Imbalanced Data

  • Qian Ya-Guan,
  • Ma Jun,
  • Zhang Xi-Min,
  • Pan Jun,
  • Zhou Wu-Jie,
  • Wu Shu-Hui,
  • Yun Ben-Sheng,
  • Lei Jing-Sheng

DOI
https://doi.org/10.1109/ACCESS.2020.2985097
Journal volume & issue
Vol. 8
pp. 64086 – 64098

Abstract

Read online

In this paper, the influence of data imbalance on neural networks is discussed, and an improved learning algorithm to solve this problem is proposed. The experimental results show that in the case of imbalanced data, the training error of neural network converges slowly and the generalization ability is poor. Our theoretical analysis shows that in the process of training, the gradient descent direction of the weights is dominated by the major-classes, which accounts for the slow convergence of the training error. Based on these results, we propose the Equilibration Mini-batch Stochastic Gradient Descent (EMSGD) method to ensure the equilibrium of the data in the mini-batch. The advantage of this technique is that it makes full use of the existing random sampling step of MSGD without increasing the computational complexity. In addition, by over-sampling of minor-classes in the mini-batch, duplicated instances would be greatly reduced, thus preventing the model from overfitting. The experimental results show that under the condition of the imbalanced training data, EMSGD can make the neural network training error converge rapidly.

Keywords