Agronomy (Nov 2023)
A Comprehensive Step-by-Step Guide to Using Data Science Tools in the Gestion of Epidemiological and Climatological Data in Rice Production Systems
Abstract
The application of data science (DS) techniques has become increasingly essential in various fields, including epidemiology and climatology in agricultural production systems. In this sector, traditionally large amounts of data are acquired, but not well-managed and -analyzed as a basis for evidence-based decision-making processes. Here, we present a comprehensive step-by-step guide that explores the use of DS in managing epidemiological and climatological data within rice production systems under tropical conditions. Our work focuses on using the multi-temporal dataset associated with the monitoring of diseases and climate variables in rice in Colombia during eight years (2012–2019). The study comprises four main phases: (I) data cleaning and organization to ensure the integrity and consistency of the dataset; (II) data management involving web-scraping techniques to acquire climate information from free databases, like WordClim and Chelsa, validation against in situ weather stations, and bias removal to enrich the dataset; (III) data visualization techniques to effectively represent the gathered information, and (IV) a basic analysis related to the clustering and climatic characterization of rice-producing areas in Colombia. In our work, a process of evaluation and the validation of climate data are conducted based on errors (r, R2, MAE, RSME) and bias evaluation metrics. In addition, in phase II, climate clustering was conducted based on a PCA and K-means algorithm. Understanding the association of climatic and epidemiological data is pivotal in predicting and mitigating disease outbreaks in rice production areas. Our research underscores the significance of DS in managing epidemiological and climatological data for rice production systems. By applying a protocol responsible for DS tools, our study provides a solid foundation for further research into disease dynamics and climate interactions in rice-producing regions and other crops, ultimately contributing to more informed decision-making processes in agriculture.
Keywords