IEEE Access (Jan 2024)

Chrono Initialized LSTM Networks With Layer Normalization

  • Antonio Tolic,
  • Biljana Mileva Boshkoska,
  • Sandro Skansi

DOI
https://doi.org/10.1109/ACCESS.2024.3445329
Journal volume & issue
Vol. 12
pp. 115219 – 115236

Abstract

Read online

Recurrent Neural Networks (RNNs), including the distinguished Long Short-Term Memory Networks (LSTMs), have been shown to be effective in a wide range of sequential data problems. However, their ability to model very long-term dependencies still poses challenges, indicating that this remains an active area of research. We propose a new methodology for gradient propagation in LSTM networks that addresses the limitations of traditional approaches. This methodology employs Chrono Initialization (CI) and Layer Normalization (LN) techniques for LSTM networks. CI ensures that the gradients are neither too small nor too large, preventing gradient vanishing or exploding, while LN enhances the stability of hidden state dynamics by aligning the data distribution, mitigating gradient-related issues, and facilitating faster learning. The proposed approach consistently outperformed baseline models in our comparative analyses, achieving average accuracy improvements across classification tasks, with improvements of up to 5% and notable performance increases exceeding 30% in certain scenarios. Significant reductions in Mean Squared Error (MSE) were persistently achieved across regression tasks, averaging 35% lower to several orders of magnitude lower MSE, especially when the baseline models underperformed. Moreover, the method significantly improved performance in sequence generation tasks, consistently yielding much lower negative log-likelihoods relative to the reference counterparts, often by several orders of magnitude. Faster convergence to the optimal minimum was also repeatedly confirmed, even when final results were not significantly divergent. Overall, this new methodology improves the performance of LSTM networks and has been evaluated across various sequential learning tasks. A formal analysis provides a deeper understanding of the mechanisms behind these improvements.

Keywords