PLoS ONE (Jan 2018)

Why sampling ratio matters: Logistic regression and studies of habitat use.

  • Ladislav Nad'o,
  • Peter Kaňuch

DOI
https://doi.org/10.1371/journal.pone.0200742
Journal volume & issue
Vol. 13, no. 7
p. e0200742

Abstract

Read online

Logistic regression (LR) models are among the most frequently used statistical tools in ecology. With LR one can infer if a species' habitat use is related to environmental factors and estimate the probability of species occurrence based on the values of these factors. However, studies often use inadequate sampling with regards to the arbitrarily chosen ratio between occupied and unoccupied (or available) locations, and this has a profound effect on the inference and predictive power of LR models. To demonstrate the effect of various sampling strategies/efforts on the quality of LR models, we used a unique census dataset containing all the used roosting cavities of the tree-dwelling bat Nyctalus leisleri and all cavities where the species was absent. We compared models constructed from randomly selected data subsets with varying ratios of occupied and unoccupied cavities (1:1, 1:5, 1:10) with a full dataset model (ratio 1:31). These comparisons revealed that the power of LR models was low when the sampling did not reflect the population ratio of occupied and unoccupied cavities. The use of weights improved the subsampled models. Thus, this study warns against inadequate data sampling and highly encourages a randomized sampling procedure to estimate the true ratio of occupied:unoccupied locations, which can then be used to optimize a manageable sampling effort and apply weights to improve the LR model. Such an approach may provide robust and reliable models suitable for both inference and prediction.