Smart Agricultural Technology (Dec 2022)
Crops yield prediction based on machine learning models: Case of West African countries
Abstract
Global agricultural production, in particular, is of increasing concern to the major international organizations in charge of nutrition. The rising demand for food globally due to unprecedented population growth has led to food insecurity in some populated regions such as Africa. Another contributing factor to global food insecurity is climate change and its variability. World and African agricultural production in particular are of increasing concern to the major international organizations in charge of nutrition. The World Food Program has reported that high population growth worldwide, especially in Africa in recent years, is leading to increased food security. Moreover, farmers and agricultural decision-makers need advanced tools to help them make quick decisions that will impact the quality of agricultural yields. Climate change has been a major phenomenon in recent decades all over the world. An impact of climate change has been observed on the quality of agricultural production. The arrival of big data technology has led to new powerful analytical tools like machine learning, which have proven themselves in many areas such as medicine, finance, and biology. In this work, we propose a prediction system based on machine learning to predict the yield of six crops, namely: rice, maize, cassava, seed cotton, yams, and bananas, at the country-level in the area of West African countries throughout the year. We combined climatic data, weather data, agricultural yields, and chemical data to help decision-makers and farmers predict the annual crop yields in their country. We used a decision tree, multivariate logistic regression, and k-nearest neighbor models to build our system. We had promising results with both models when using three machine learning models. We applied a hyper-parameter tuning technique throughout cross-validation to get a better model that does not face overfitting. We found that the decision tree model performs well with a coefficient of determination(R2) of 95.3% while the K-Nearest Neighbor model and logistic regression perform respectively with R2=93.15% and R2=89.78%. We also study the correlation between the predicted results and the expected results. We found that the prediction results of the decision tree model and the K-Nearest Neighbor model are correlated to the expected data, which proves the efficacy of the model.