Ecological Indicators (Oct 2021)
Comparison of regression-based and machine learning techniques to explain alpha diversity of fish communities in streams of central and eastern India
Abstract
Over the past several decades, ecologists have been striving to develop models that accurately describe species-habitat relationships across ecological communities. Statistical models that explain ecological dynamics need to consider the nuances of the complex interactions between communities and ecological factors. Here, we used multiple linear mixed models (LMM), generalized additive models (GAM), multivariate adaptive regression splines (MARS), and artificial neural networks (ANN) to model species richness and diversity of freshwater fishes in eastern and central India. The models were based on fish abundance and associated ecological data over three years across the study regions. We developed global models using all predictors after removing highly correlated variables (Pearson’s r > 0.7). Results revealed conductivity, water temperature, and water velocity as the most important predictive factors of both species richness and diversity. We, then, built two subsets of selected factors to build predictive models for diversity and richness- one variable set containing common significant factors as revealed from the four different modeling methods used and the second, using an automatic feature selection technique. Amongst the modeling methods used in our study, ANN was found to create the best fit models for explaining nonlinearities between response variables and predictors. The importance of variable selection is highlighted, given that subset 1 (common consensual factors) creates more homogeneity in predictions compared to using subset 2 (automated feature selection). Contrary to similar studies in recent years, which show machine learning (ML) methods to typically outperform conventional methods, our results revealed that ANN performed at par with other methods in terms of predictive power. Our findings underline the need for a judicious choice of modeling techniques based on the availability of the data and the ecological communities being studied.