Web Ecology (May 2013)
Prevalence, statistical thresholds, and accuracy assessment for species distribution models
Abstract
For species distribution models, species frequency is termed prevalence and prevalence in samples should be similar to natural species prevalence, for unbiased samples. However, modelers commonly adjust sampling prevalence, producing a modeling prevalence that has a different frequency of occurrences than sampling prevalence. The separate effects of (1) use of sampling prevalence compared to adjusted modeling prevalence and (2) modifications necessary in thresholds, which convert continuous probabilities to discrete presence or absence predictions, to account for prevalence, are unresolved issues. We examined effects of prevalence and thresholds and two types of pseudoabsences on model accuracy. Use of sampling prevalence produced similar models compared to use of adjusted modeling prevalences. Mean correlation between predicted probabilities of the least (0.33) and greatest modeling prevalence (0.83) was 0.86. Mean predicted probability values increased with increasing prevalence; therefore, unlike constant thresholds, varying threshold to match prevalence values was effective in holding true positive rate, true negative rate, and species prediction areas relatively constant for every modeling prevalence. The area under the curve (AUC) values appeared to be as informative as sensitivity and specificity, when using surveyed pseudoabsences as absent cases, but when the entire study area was coded, AUC values reflected the area of predicted presence as absent. Less frequent species had greater AUC values when pseudoabsences represented the study background. Modeling prevalence had a mild impact on species distribution models and accuracy assessment metrics when threshold varied with prevalence. Misinterpretation of AUC values is possible when AUC values are based on background absences, which correlate with frequency of species.