Simulating multi-scale optimization and variable selection in species distribution modeling

Samuel A. Cushman; Zaneta M. Kaszta; Patrick Burns; Christopher R. Hakkenberg; Patrick Jantz; David W. Macdonald; Jedediah F. Brodie; Mairin C.M. Deith; Scott Goetz

Ecological Informatics (Nov 2024)

Simulating multi-scale optimization and variable selection in species distribution modeling

Samuel A. Cushman,
Zaneta M. Kaszta,
Patrick Burns,
Christopher R. Hakkenberg,
Patrick Jantz,
David W. Macdonald,
Jedediah F. Brodie,
Mairin C.M. Deith,
Scott Goetz

Affiliations

Samuel A. Cushman: Wildlife Conservation Research Unit, The Recanati-Kaplan Centre, Department of Biology, University of Oxford, UK; Department of Biology, Northern Arizona University, USA; Corresponding author at: Wildlife Conservation Research Unit, The Recanati-Kaplan Centre, Department of Biology, University of Oxford, UK.
Zaneta M. Kaszta: Wildlife Conservation Research Unit, The Recanati-Kaplan Centre, Department of Biology, University of Oxford, UK; GEODE Laboratory, School of Informatics, Computing & Cyber Systems, Northern Arizona University, USA
Patrick Burns: GEODE Laboratory, School of Informatics, Computing & Cyber Systems, Northern Arizona University, USA
Christopher R. Hakkenberg: GEODE Laboratory, School of Informatics, Computing & Cyber Systems, Northern Arizona University, USA
Patrick Jantz: GEODE Laboratory, School of Informatics, Computing & Cyber Systems, Northern Arizona University, USA
David W. Macdonald: Wildlife Conservation Research Unit, The Recanati-Kaplan Centre, Department of Biology, University of Oxford, UK
Jedediah F. Brodie: Division of Biological Sciences and Wildlife Biology Program, University of Montana, Missoula, MT 59812, USA; Institute of Biodiversity and Environmental Conservation, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Malaysia
Mairin C.M. Deith: Institute for the Oceans and Fisheries, University of British Columbia, 2202 Main Mall, Vancouver, BC V6T 1Z4, Canada
Scott Goetz: GEODE Laboratory, School of Informatics, Computing & Cyber Systems, Northern Arizona University, USA

Journal volume & issue: Vol. 83
p. 102832

Abstract

Read online

Species distribution modeling (SDM) is a fundamental tool in theoretical and applied ecology. However, relatively little is known about the performance of different approaches for scale optimization, model selection, and algorithmic prediction in the context of nonlinear, multiscale and interactive relationships between environmental variables and species occurrence. Modelers often struggle to optimize a tradeoff between ecological relevance, model robustness, complexity, and overfitting. In this paper, we investigated several methods designed to optimize spatial scale and variable selection in SDMs, in each case evaluating model fitness, parsimony and predictive performance. We used a simulation approach to produce a large pool of alternative underlying habitat relationships that reflect a broad range of realistic habitat associations. We also compared several different modeling algorithms, including logistic regression with a generalized linear model (GLM), Lasso and Elastic-Net Regularized GLMs (GLMNet), and random forest (RF), as well as alternative variable and scale selection methods. We found that GLM methods employing all-subsets dredge routines for variable selection were consistently the best predictors based on all criteria of our model performance assessment and across all attributes of the simulated underlying relationship, including nonlinearity and interaction. We had expected machine learning approaches, such as random forest, to perform better in these more complex forms of species-environment relationships. GLM using dredge variable selection was also the method that included the fewest spurious covariates and included the most correct predictors as a proportion of all predictors. We found that univariate scaling was the most robust method of variable and scale selection, along with Minimal Redundancy Maximal Relevancy (MRMR) which performed equivalently. The simulation experiment presented here provides a robust assessment of simulated multi-species distribution model performance, complexity and fidelity. By simulating a large range of potential habitat relationships with varying spatial scale, effect sizes, linearity, and interactions, we comprehensively evaluated model performance across gradients of complexity of the underlying relationships and violations of classical statistical assumptions. This study provides a valuable assessment and a broader example of the power and utility of controlled simulation experiments in habitat relationships and other ecological spatial predictive modeling.

Published in Ecological Informatics

ISSN: 1574-9541 (Print); 1878-0512 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Biology (General): Ecology
Website: https://www.sciencedirect.com/journal/ecological-informatics

About the journal

Abstract

Keywords