Ecological Informatics (Sep 2025)

Does machine learning outperform logistic regression in predicting individual tree mortality?

  • Aitor Vázquez-Veloso,
  • Astor Toraño Caicoya,
  • Felipe Bravo,
  • Peter Biber,
  • Enno Uhl,
  • Hans Pretzsch

DOI
https://doi.org/10.1016/j.ecoinf.2025.103140
Journal volume & issue
Vol. 88
p. 103140

Abstract

Read online

Tree mortality is a crucial process in forest dynamics and a key component of forest growth models and simulators. Factors like competition, drought, and pathogens drive tree mortality, but the underlying mechanism is challenging to model. The current environmental changes are even complicating model approaches as they influence and alter all the factors involving mortality. However, innovative classification algorithms can go deep into data to find patterns that can model or even explain their relationship. We use Logistic binomial Regression as the reference algorithm for predicting individual tree mortality. However, different machine learning (ML) alternatives already applied to other forest modeling topics can be used for this purpose. Here, we compare the performance of five different ML algorithms (Decision Trees, Random Forest, Naive Bayes, K-Nearest Neighbour, and Support Vector Machine) against Logistic binomial Regression in individual tree mortality classification under 40 different case studies and a cross-validation case study. The data used corresponds to Norway spruce long-term experimental plots, which have a total of 75,522 tree records and a 10.28 % mortality rate on average. Through different case studies, when more variables were used, general performance improved as expected, while more extensive datasets decreased the performance level of the algorithms. Performance was also higher when plots remained without management compared to thinned ones. Random Forest outperformed the other algorithms in all the cases except cross-validation, where it was the weaker one. Our results demonstrate the potential of ML in assessing tree mortality. When the model application is not clearly defined and/or model interpretability is needed, Logistic binomial Regression is still the best tool for evaluating individual tree mortality.

Keywords