Applied Sciences (Apr 2019)
Improving RNN Performance by Modelling Informative Missingness with Combined Indicators
Abstract
Daily questionnaires from mobile applications allow large amounts of data to be collected with relative ease. However, these data almost always suffer from missing data, be it due to unanswered questions, or simply skipping the survey some days. These missing data need to be addressed before the data can be used for inferential or predictive purposes. Several strategies for dealing with missing data are available, but most are prohibitively computationally intensive for larger models, such as a recurrent neural network (RNN). Perhaps even more important, few methods allow for data that are missing not at random (MNAR). Hence, we propose a simple strategy for dealing with missing data in longitudinal surveys from mobile applications, using a long-term-short-term-memory (LSTM) network with a count of the missing values in each survey entry and a lagged response variable included in the input. We then propose additional simplifications for padding the days a user has skipped the survey entirely. Finally, we compare our strategy with previously suggested methods on a large daily survey with data that are MNAR and conclude that our method worked best, both in terms of prediction accuracy and computational cost.
Keywords