Multimedia Data Modelling Using Multidimensional Recurrent Neural Networks

Zhen He; Shaobing Gao; Liang Xiao; Daxue Liu; Hangen He

doi:10.3390/sym10090370

Symmetry (Sep 2018)

Multimedia Data Modelling Using Multidimensional Recurrent Neural Networks

Zhen He,
Shaobing Gao,
Liang Xiao,
Daxue Liu,
Hangen He

Affiliations

Zhen He: College of Intelligence Science, National University of Defense Technology, Changsha 410073, China
Shaobing Gao: Department of Computer Science, Sichuan University, Chengdu 610065, China
Liang Xiao: Unmanned Systems Research Center, National Innovation Institute of Defense Technology, Beijing 100071, China
Daxue Liu: College of Intelligence Science, National University of Defense Technology, Changsha 410073, China
Hangen He: College of Intelligence Science, National University of Defense Technology, Changsha 410073, China

DOI: https://doi.org/10.3390/sym10090370
Journal volume & issue: Vol. 10, no. 9
p. 370

Abstract

Read online

Modelling the multimedia data such as text, images, or videos usually involves the analysis, prediction, or reconstruction of them. The recurrent neural network (RNN) is a powerful machine learning approach to modelling these data in a recursive way. As a variant, the long short-term memory (LSTM) extends the RNN with the ability to remember information for longer. Whilst one can increase the capacity of LSTM by widening or adding layers, additional parameters and runtime are usually required, which could make learning harder. We therefore propose a Tensor LSTM where the hidden states are tensorised as multidimensional arrays (tensors) and updated through a cross-layer convolution. As parameters are spatially shared within the tensor, we can efficiently widen the model without extra parameters by increasing the tensorised size; as deep computations of each time step are absorbed by temporal computations of the time series, we can implicitly deepen the model with little extra runtime by delaying the output. We show by experiments that our model is well-suited for various multimedia data modelling tasks, including text generation, text calculation, image classification, and video prediction.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords