Frontiers in Genetics (May 2024)
Longitudinal method comparison: modeling polygenic risk for post-traumatic stress disorder over time in individuals of African and European ancestry
Abstract
Cross-sectional data allow the investigation of how genetics influence health at a single time point, but to understand how the genome impacts phenotype development, one must use repeated measures data. Ignoring the dependency inherent in repeated measures can exacerbate false positives and requires the utilization of methods other than general or generalized linear models. Many methods can accommodate longitudinal data, including the commonly used linear mixed model and generalized estimating equation, as well as the less popular fixed-effects model, cluster-robust standard error adjustment, and aggregate regression. We simulated longitudinal data and applied these five methods alongside naïve linear regression, which ignored the dependency and served as a baseline, to compare their power, false positive rate, estimation accuracy, and precision. The results showed that the naïve linear regression and fixed-effects models incurred high false positive rates when analyzing a predictor that is fixed over time, making them unviable for studying time-invariant genetic effects. The linear mixed models maintained low false positive rates and unbiased estimation. The generalized estimating equation was similar to the former in terms of power and estimation, but it had increased false positives when the sample size was low, as did cluster-robust standard error adjustment. Aggregate regression produced biased estimates when predictor effects varied over time. To show how the method choice affects downstream results, we performed longitudinal analyses in an adolescent cohort of African and European ancestry. We examined how developing post-traumatic stress symptoms were predicted by polygenic risk, traumatic events, exposure to sexual abuse, and income using four approaches—linear mixed models, generalized estimating equations, cluster-robust standard error adjustment, and aggregate regression. While the directions of effect were generally consistent, coefficient magnitudes and statistical significance differed across methods. Our in-depth comparison of longitudinal methods showed that linear mixed models and generalized estimating equations were applicable in most scenarios requiring longitudinal modeling, but no approach produced identical results even if fit to the same data. Since result discrepancies can result from methodological choices, it is crucial that researchers determine their model a priori, refrain from testing multiple approaches to obtain favorable results, and utilize as similar as possible methods when seeking to replicate results.
Keywords