Methods in Ecology and Evolution (May 2023)

Guidelines for the prediction of species interactions through binary classification

  • Timothée Poisot

DOI
https://doi.org/10.1111/2041-210X.14071
Journal volume & issue
Vol. 14, no. 5
pp. 1333 – 1345

Abstract

Read online

Abstract The prediction of species interactions is gaining momentum as a way to circumvent limitations in data volume. Yet, ecological networks are challenging to predict because they are typically small and sparse. Dealing with extreme class imbalance is a challenge for most binary classifiers, and there are currently no guidelines as to how predictive models can be trained for this specific problem. Using simple mathematical arguments and numerical experiments in which a variety of classifiers (for supervised learning) are trained on simulated networks, we develop a series of guidelines related to the choice of measures to use for model selection, and the ways to assemble the training dataset. Neither classifier accuracy nor the area under the receiver operating characteristic curve (ROC‐AUC) are informative measures for the performance of interaction prediction. The area under the precision‐recall curve (PR‐AUC) is a fairer assessment of performance. In some cases, even standard measures can lead to selecting a more biased classifier because the effect of connectance is strong. The amount of correction to apply to the training dataset depends on network connectance, on the measure to be optimized, and only weakly on the classifier. These results reveal that training machines to predict networks is a challenging task, and that in virtually all cases, the composition of the training set needs to be fine‐tuned before performing the actual training. We discuss these consequences in the context of the low volume of data.

Keywords