Computational Ecology and Software (Sep 2020)

Taxonomic identification of hoverfly specimens using neural network and gradient boosting machine techniques

  • Dunja Popovic,
  • Vuk Popovic,
  • Nevena Velickovic, et al.

Journal volume & issue
Vol. 10, no. 3
pp. 105 – 116

Abstract

Read online

The correct identification of single specimens on a particular area has great importance in establishing appropriate biodiversity protection programs. Species of the genus Merodon Meigen, 1803 (Diptera, Syrphidae) represent important pollinators that are particularly associated with the pollination of wild and cultivated bulbous plants, both wild and cultivated. In order to contribute to a taxonomic issue of separating two cryptic, sibling hoverfly species of M. avidus species complex, we programmed and trained specific prediction model that was able to specify to which of two assumed species (M. avidus or M. moenium) each database specimen belongs. Using two ML techniques (artificial neural network (ANN) and gradient boosting machine (GBM)), we created two separable models, depending on a variable used for a prediction (Model 1 - modelling based on a geographic variable, Model 2 - modelling based on a temporal variable). Moreover, each model was trained and tested with different data sets, resulting in a different predictive accuracy. While ANN modelling showed a higher percent of correct determination when using surrogate information than when using reduced (basic) data set, GBM modelling has given a quite stable result through all three data types. In both ML approaches, comparing Model 1 and Model 2 results showed that prediction based on a temporal variable (day, month and a year of specimen sampling) reached a better predictive performance than a prediction based on a longitude and latitude, on all data sets. This led us to the conclusion that information about the time of sampling was more useful for creating desired determination key with artificial intelligence algorithms than information about longitude and latitude of sampling localities. Therefore, we suggest that time of activity of adult specimens could have been of greater importance in the differentiation of M. avidus and M. moenium species from a common ancestor. The environmental factors and selective forces connected with the season might have had a more important role in M. avidus / M. moenium speciation, compared to environmental factors / selective pressures connected with the geographic position of their activity. The demonstrated modelling represents a positive signal in the field of potential implementation of these systems as support in the initial determination Merodon specimens. We suggest it2s potential use as technical support in old and partially unreliable databases, in determination of fresh sampled specimens as well as in finding the most efficient sampling strategies.

Keywords