IEEE Access (Jan 2024)

An Ensemble Machine Learning Framework for Cotton Crop Yield Prediction Using Weather Parameters: A Case Study of Pakistan

  • Syed Tahseen Haider,
  • Wenping Ge,
  • Jianqiang Li,
  • Saif Ur Rehman,
  • Azhar Imran,
  • Mohamed Abdel Fattah Sharaf,
  • Syed Muhammad Haider

DOI
https://doi.org/10.1109/ACCESS.2024.3454511
Journal volume & issue
Vol. 12
pp. 124045 – 124061

Abstract

Read online

In Pakistan, agriculture is one of the most common and least lucrative professions. It provides between 18% and 25% of Pakistan’s overall gross domestic product (GDP). The majority of Pakistan’s crops, like cotton, are completely weather-dependent. Regarding this, farmers are constantly attempting to implement new techniques and technology to boost crop yields. Technology-based approaches to crop yield analysis, such as machine learning (ML) and data mining, are causing a boom in the agricultural sector by altering the revenue scenario through the growth of the best crop. By utilizing ML algorithms to analyze agriculture climatic data, it is possible to increase crop yields. The proposed research was carried out in two dimensions. Initially, field observations were made to determine the effects of daily variations in meteorological parameters, such as rainfall, temperature, and wind, on plant growth and development at each phonological stage of cotton crop production. Throughout the Kharif Seasons 2005-2020, various phonological stages of the cotton crop grown in the fields of the Ayyub Agriculture Research Institute in Faisalabad (Central Punjab) were monitored using meteorological and phonological observations, as well as soil data. Finally, the cotton prediction framework as Random Forest Extreme Gradient (RFXG) has been proposed to predict cotton production based on observed data. RFXG concentrates on the quantification of machine learning algorithms and their practical application. The workings of RFXG have been divided into two phases. In the very first phase of data collection, preprocessing, attribute selection, and data splitting have been presented. In the following phase, prediction and evaluation were developed. The comparative results show that the prediction results of the proposed RFXG using the optimization algorithm are significantly improved by 0.05 RMSE (Root Mean Square Error) in comparison to the traditional Extreme Gradient Boost (XGB) model, which has a RMSE of 0.07. Proposed technique also compared with some baseline approaches of cotton predication. Comparison shows that proposed technique achieves better results as compared to baseline approaches. The proposed RFXG model (ensemble-based method) can bag, stack, and boost, making it fast and efficient predications as compared with existing approaches. Bagging averages, the results of numerous decision tree fit to different subsets of the same dataset to increase accuracy. The proposed study will be very useful in the future to close the gap between the current yield obtained and the potential yield of this cultivar, which is grown in Pakistan and other cotton-growing locations.

Keywords