Mathematics (Jul 2020)

Improving the Representativeness of a Simple Random Sample: An Optimization Model and Its Application to the Continuous Sample of Working Lives

  • Vicente Núñez-Antón,
  • Juan Manuel Pérez-Salamero González,
  • Marta Regúlez-Castillo,
  • Carlos Vidal-Meliá

DOI
https://doi.org/10.3390/math8081225
Journal volume & issue
Vol. 8, no. 8
p. 1225

Abstract

Read online

This paper proposes an optimization model for selecting a larger subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex MINLP) and is, therefore, NP-hard. However, the solution is found by maximizing the size of the subsample taken from a stratified random sample with proportional allocation and restricting it to a p-value large enough to achieve a good fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The paper also applies the model to the Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records and the results prove that it is possible to obtain a larger subsample from the CSWL that (far) better represents the pensioner population for each of the waves analyzed.

Keywords