Ecological Informatics (Sep 2024)

Generalizability evaluations of heterogeneous ensembles for river health predictions

  • Taeseung Park,
  • Jihoon Shin,
  • Baekyung Park,
  • Jeongsuk Moon,
  • YoonKyung Cha

Journal volume & issue
Vol. 82
p. 102719

Abstract

Read online

Predictive models leverage the relationships between environmental factors and river health to predict the river health at unmonitored sites. Such models should be generalizable to unseen data. Among various machine learning models, heterogeneous ensembles are known to be generalizable owing to their structural diversity. The present study compares the generalizability of heterogeneous ensembles with those of homogeneous ensembles and single models. The models classified five grades (very good to very poor) of river health indices (RHIs) for three taxa (benthic macroinvertebrates, fish, and diatoms) given various environmental factors (water quality, hydrology, meteorological, land cover, and stream properties) as inputs. The data were monitored at 2915 sites in the four major river watersheds in South Korea during the 2016–2021 period. The results indicated better generalizability of the heterogeneous and homogeneous ensembles than single models. Moreover, heterogeneous ensembles tended to show higher generalizability than homogeneous ensembles, although the differences were marginal. Weighted soft voting was the most generalizable of the heterogeneous ensembles, with losses ranging from 0.49 to 0.59 across the three taxa. Weighted soft voting also delivered acceptable classification performance on the test set, with accuracies ranging from 0.42 to 0.52 across the taxa. The relative contributions of the environmental factors to RHI predictions and the directions of their effects agreed with established knowledge, confirming the reliability of the predictions. However, as heterogeneous ensembles have been rarely applied to RHI prediction, the extent to which heterogeneous ensembles improve the generalizability of prediction must be investigated in future studies.

Keywords