Royal Society Open Science (Jun 2023)

Characteristic features of statistical models and machine learning methods derived from pest and disease monitoring datasets

  • Shigeki Kishi,
  • Jianqiang Sun,
  • Akira Kawaguchi,
  • Sunao Ochi,
  • Megumi Yoshida,
  • Takehiko Yamanaka

DOI
https://doi.org/10.1098/rsos.230079
Journal volume & issue
Vol. 10, no. 6

Abstract

Read online

While many studies have used traditional statistical methods when analysing monitoring data to predict future population dynamics of crop pests and diseases, increasing studies have used machine learning methods. The characteristic features of these methods have not been fully elucidated and arranged. We compared the prediction performance between two statistical and seven machine learning methods using 203 monitoring datasets recorded over several decades on four major crops in Japan and meteorological and geographical information as the explanatory variables. The decision tree and random forest of machine learning were found to be most efficient, while regression models of statistical and machine learning methods were relatively inferior. The best two methods were better for biased and scarce data, while the statistical Bayesian model was better for larger dataset sizes. Therefore, researchers should consider data characteristics when selecting the most appropriate method.

Keywords