Mathematics (May 2024)
Testing Informativeness of Covariate-Induced Group Sizes in Clustered Data
Abstract
Clustered data are a special type of correlated data where units within a cluster are correlated while units between different clusters are independent. The number of units in a cluster can be associated with that cluster’s outcome. This is called the informative cluster size (ICS), which is known to impact clustered data inference. However, when comparing the outcomes from multiple groups of units in clustered data, investigating ICS may not be enough. This is because the number of units belonging to a particular group in a cluster can be associated with the outcome from that group in that cluster, leading to an informative intra-cluster group size or IICGS. This phenomenon of IICGS can exist even in the absence of ICS. Ignoring the existence of IICGS can result in a biased inference for group-based outcome comparisons in clustered data. In this article, we mathematically formulate the concept of IICGS while distinguishing it from ICS and propose a nonparametric bootstrap-based statistical hypothesis-testing mechanism for testing any claim of IICGS in a clustered data setting. Through simulations and real data applications, we demonstrate that our proposed statistical testing method can accurately identify IICGS, with substantial power, in clustered data.
Keywords