Mathematics (Sep 2024)

Projection-Uniform Subsampling Methods for Big Data

  • Yuxin Sun,
  • Wenjun Liu,
  • Ye Tian

DOI
https://doi.org/10.3390/math12192985
Journal volume & issue
Vol. 12, no. 19
p. 2985

Abstract

Read online

The idea of experimental design has been widely used in subsampling algorithms to extract a small portion of big data that carries useful information for statistical modeling. Most existing subsampling algorithms of this kind are model-based and designed to achieve the corresponding optimality criteria for the model. However, data generating models are frequently unknown or complicated. Model-free subsampling algorithms are needed for obtaining samples that are robust under model misspecification and complication. This paper introduces two novel algorithms, called the Projection-Uniform Subsampling algorithm and its extension. Both algorithms aim to extract a subset of samples from big data that are space-filling in low-dimensional projections. We show that subdata obtained from our algorithms perform superiorly under the uniform projection criterion and centered L2-discrepancy. Comparisons among our algorithms, model-based and model-free methods are conducted through two simulation studies and two real-world case studies. We demonstrate the robustness of our proposed algorithms in building statistical models in scenarios involving model misspecification and complication.

Keywords