Comparing Different Statistical Models and Multiple Testing Corrections for Association Mapping in Soybean and Maize

Avjinder S. Kaler; Jason D. Gillman; Timothy Beissinger; Larry C. Purcell

doi:10.3389/fpls.2019.01794

Frontiers in Plant Science (Feb 2020)

Comparing Different Statistical Models and Multiple Testing Corrections for Association Mapping in Soybean and Maize

Avjinder S. Kaler,
Jason D. Gillman,
Timothy Beissinger,
Larry C. Purcell

Affiliations

Avjinder S. Kaler: Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
Jason D. Gillman: Plant Genetic Research Unit, USDA-ARS, Columbia, MO, United States
Timothy Beissinger: Division of Plant Breeding Methodology, Center for Integrated Breeding Research, Georg-August-Universität, Göttingen, Germany
Larry C. Purcell: Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States

DOI: https://doi.org/10.3389/fpls.2019.01794
Journal volume & issue: Vol. 10

Abstract

Read online

Association mapping (AM) is a powerful tool for fine mapping complex trait variation down to nucleotide sequences by exploiting historical recombination events. A major problem in AM is controlling false positives that can arise from population structure and family relatedness. False positives are often controlled by incorporating covariates for structure and kinship in mixed linear models (MLM). These MLM-based methods are single locus models and can introduce false negatives due to over fitting of the model. In this study, eight different statistical models, ranging from single-locus to multilocus, were compared for AM for three traits differing in heritability in two crop species: soybean (Glycine max L.) and maize (Zea mays L.). Soybean and maize were chosen, in part, due to their highly differentiated rate of linkage disequilibrium (LD) decay, which can influence false positive and false negative rates. The fixed and random model circulating probability unification (FarmCPU) performed better than other models based on an analysis of Q-Q plots and on the identification of the known number of quantitative trait loci (QTLs) in a simulated data set. These results indicate that the FarmCPU controls both false positives and false negatives. Six qualitative traits in soybean with known published genomic positions were also used to compare these models, and results indicated that the FarmCPU consistently identified a single highly significant SNP closest to these known published genes. Multiple comparison adjustments (Bonferroni, false discovery rate, and positive false discovery rate) were compared for these models using a simulated trait having 60% heritability and 20 QTLs. Multiple comparison adjustments were overly conservative for MLM, CMLM, ECMLM, and MLMM and did not find any significant markers; in contrast, ANOVA, GLM, and SUPER models found an excessive number of markers, far more than 20 QTLs. The FarmCPU model, using less conservative methods (false discovery rate, and positive false discovery rate) identified 10 QTLs, which was closer to the simulated number of QTLs than the number found by other models.

Published in Frontiers in Plant Science

ISSN: 1664-462X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Agriculture: Plant culture
Website: https://www.frontiersin.org/journals/plant-science

About the journal

Abstract

Keywords