HOW MACHINE LEARNING ALGORITHMS ARE USED IN METEOROLOGICAL DATA CLASSIFICATION: A COMPARATIVE APPROACH BETWEEN DT, LMT, M5-MT, GRADIENT BOOSTING AND GWLM-NARX MODELS

Sheikh Amir FAYAZ; Majid ZAMAN; Muheet Ahmed BUTT; Sameer KAUL

doi:10.35784/acs-2022-26

Applied Computer Science (Dec 2022)

HOW MACHINE LEARNING ALGORITHMS ARE USED IN METEOROLOGICAL DATA CLASSIFICATION: A COMPARATIVE APPROACH BETWEEN DT, LMT, M5-MT, GRADIENT BOOSTING AND GWLM-NARX MODELS

Sheikh Amir FAYAZ,
Majid ZAMAN,
Muheet Ahmed BUTT,
Sameer KAUL

Affiliations

Sheikh Amir FAYAZ: ORCiD; Department of Computer Sciences, University of Kashmir, J&K, India, [email protected]
Majid ZAMAN: ORCiD; Directorate of IT & SS, University of Kashmir, J&K, India, [email protected]
Muheet Ahmed BUTT: ORCiD; Department of Computer Sciences, University of Kashmir, J&K, India
Sameer KAUL: ORCiD; Department of Computer Sciences, University of Kashmir, J&K, India

DOI: https://doi.org/10.35784/acs-2022-26
Journal volume & issue: Vol. 18, no. 4
pp. 16 – 27

Abstract

Read online

Rainfall prediction is one of the most challenging task faced by researchers over the years. Many machine learning and AI based algorithms have been implemented on different datasets for better prediction purposes, but there is not a single solution which perfectly predicts the rainfall. Accurate prediction still remains a question to researchers. We offer a machine learning-based comparison evaluation of rainfall models for Kashmir province. Both local geographic features and the time horizon has influence on weather forecasting. Decision trees, Logistic Model Trees (LMT), and M5 model trees are examples of predictive models based on algorithms. GWLM-NARX, Gradient Boosting, and other techniques were investigated. Weather predictors measured from three major meteorological stations in the Kashmir area of the UT of J&K, India, were utilized in the models. We compared the proposed models based on their accuracy, kappa, interpretability, and other statistics, as well as the significance of the predictors utilized. On the original dataset, the DT model delivers an accuracy of 80.12 percent, followed by the LMT and Gradient boosting models, which produce accuracy of 87.23 percent and 87.51 percent, respectively. Furthermore, when continuous data was used in the M5-MT and GWLM-NARX models, the NARX model performed better, with mean squared error (MSE) and regression value (R) predictions of 3.12 percent and 0.9899 percent in training, 0.144 percent and 0.9936 percent in validation, and 0.311 percent and 0.9988 percent in testing.

Published in Applied Computer Science

ISSN: 1895-3735 (Print); 2353-6977 (Online)
Publisher: Polish Association for Knowledge Promotion
Country of publisher: Poland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.acs.pollub.pl/

About the journal

Abstract

Keywords