Stats (Jul 2024)
Estimator Comparison for the Prediction of Election Results
Abstract
Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: Ratio estimator, Horvitz–Thompson estimator and the linear regression estimator. While both the Ratio and Horvitz–Thompson estimators are widely used in cluster analysis, we propose a linear regression estimator defined for unequal cluster sizes, which, in many scenarios, performs better than the other two. The main objective of this paper is twofold. Firstly, to indicate which estimator is most suited for predicting the outcome of the popular vote in the United States of America. We do so by applying the single-stage cluster sampling technique to our data. In the first partition, we use the 50 states plus the District of Columbia as primary sampling units, whereas in the second one, we use 3112 counties instead. Secondly, based on the results of the aforementioned procedure, we estimate the number of clusters in a sample for a set standard error while also considering the diminishing returns from increasing the number of clusters in the sample. The linear regression estimator is best in the majority of the examined cases. This type of comparison can also be used for the estimation of any other country’s elections if prior voting results are available.
Keywords