Silva Fennica (Jan 2013)
An assessment of three variance estimators for the k-nearest neighbour technique
Abstract
A jackknife (JK), a bootstrap (BOOT), and an empirical difference estimator (EDE) of totals and variance were assessed in simulated sampling from three artificial but realistic complex multivariate populations (Nâ=â8000 elements) organized in clusters of four elements. Intra-cluster correlations of the target variables (Y) varied from 0.03 to 0.26. Time-saving implementations of JK and BOOT are detailed. In simple random sampling (SRS), bias in totals was ⤠0.4% for the two largest sample sizes (nâ=â200, 300), but slightly larger for nâ=â50, and 100. In cluster sampling (CLU) bias was typically 0.1% higher and more variable. The lowest overall bias was in EDE. In both SRS and CLU, JK estimates of standard error were slightly (3%) too high, while the bootstrap estimates in both SRS and CLU were too low (8%). Estimates of error suggested a trend in EDE toward an overestimation with increasing sample size. Calculated 95% confidence intervals achieved a coverage that in most cases was fairly close (±â2%) to the nominal level. For estimation of a population total the EDE estimator appears to be slightly better than the JK estimator.