Journal of Nigerian Society of Physical Sciences (Jun 2024)

Identifying heterogeneity for increasing the prediction accuracy of machine learning models

  • Paavithashnee Ravi Kumar,
  • Majid Khan Majahar Ali,
  • Olayemi Joshua Ibidoja

DOI
https://doi.org/10.46481/jnsps.2024.2058
Journal volume & issue
Vol. 6, no. 3

Abstract

Read online

In recent years, the significance of machine learning in agriculture has surged, particularly in post-harvest monitoring for sustainable aquaculture. Challenges like heterogeneity, irrelevant variables and multicollinearity hinder the implementation of smart monitoring systems. However, this study focuses on investigating heterogeneity among drying parameters that determine the moisture content removal during seaweed drying due to its limited attention, particularly within the field of agriculture. Additionally, a heterogeneity model within machine learning algorithms is proposed to enhance accuracy in predicting seaweed moisture content removal, both before and after the removal of heterogeneity parameters and also after the inclusion of single-eliminated heterogeneity parameters. The dataset consists of 1914 observations with 29 independent variables, but this study narrows down to five: Temperature (T1, T4, T7), Humidity (H5), and Solar Radiation (PY). These variables are interacted up to second-order interactions, resulting in 55 variables. Variance inflation factor and boxplots are employed to identify heterogeneity parameters. Two predictive machine learning models, namely random forest and elastic net are then utilized to identify the 15 and 20 highest important parameters for seaweed moisture content removal. Evaluation metrics (MSE, SSE, MAPE, and R-squared) are used to assess model performance. Results demonstrate that the random forest model outperforms the elastic net model in terms of higher accuracy and lower error, both before and after removing heterogeneity parameters, and even after reintroducing single-eliminated heterogeneity parameters. Notably, the random forest model exhibits higher accuracy before excluding heterogeneity parameters.

Keywords