Efficient use of binned data for imputing univariate time series data

Jay Darji; Nupur Biswas; Nupur Biswas; Vijay Padul; Jaya Gill; Santosh Kesari; Shashaanka Ashili

doi:10.3389/fdata.2024.1422650

Frontiers in Big Data (Aug 2024)

Efficient use of binned data for imputing univariate time series data

Jay Darji,
Nupur Biswas,
Nupur Biswas,
Vijay Padul,
Jaya Gill,
Santosh Kesari,
Shashaanka Ashili

Affiliations

Jay Darji: Rhenix Lifesciences, Hyderabad, Telangana, India
Nupur Biswas: Rhenix Lifesciences, Hyderabad, Telangana, India
Nupur Biswas: CureScience, San Diego, CA, United States
Vijay Padul: Rhenix Lifesciences, Hyderabad, Telangana, India
Jaya Gill: CureScience, San Diego, CA, United States
Santosh Kesari: Department of Translational Neurosciences, Pacific Neuroscience Institute and Saint John's Cancer Institute at Providence Saint John's Health Center, Santa Monica, CA, United States
Shashaanka Ashili: CureScience, San Diego, CA, United States

DOI: https://doi.org/10.3389/fdata.2024.1422650
Journal volume & issue: Vol. 7

Abstract

Read online

Time series data are recorded in various sectors, resulting in a large amount of data. However, the continuity of these data is often interrupted, resulting in periods of missing data. Several algorithms are used to impute the missing data, and the performance of these methods is widely varied. Apart from the choice of algorithm, the effective imputation depends on the nature of missing and available data. We conducted extensive studies using different types of time series data, specifically heart rate data and power consumption data. We generated the missing data for different time spans and imputed using different algorithms with binned data of different sizes. The performance was evaluated using the root mean square error (RMSE) metric. We observed a reduction in RMSE when using binned data compared to the entire dataset, particularly in the case of the expectation–maximization (EM) algorithm. We found that RMSE was reduced when using binned data for 1-, 5-, and 15-min missing data, with greater reduction observed for 15-min missing data. We also observed the effect of data fluctuation. We conclude that the usefulness of binned data depends precisely on the span of missing data, sampling frequency of the data, and fluctuation within data. Depending on the inherent characteristics, quality, and quantity of the missing and available data, binned data can impute a wide variety of data, including biological heart rate data derived from the Internet of Things (IoT) device smartwatch and non-biological data such as household power consumption data.

Published in Frontiers in Big Data

ISSN: 2624-909X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.frontiersin.org/journals/big-data

About the journal

Abstract

Keywords