Scientific Reports (May 2024)

Improved random forest classification model combined with C5.0 algorithm for vegetation feature analysis in non-agricultural environments

  • Tianyu Wang

DOI
https://doi.org/10.1038/s41598-024-60066-x
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 13

Abstract

Read online

Abstract In response to the challenges posed by the high computational complexity and suboptimal classification performance of traditional random forest algorithms when dealing with high-dimensional and noisy non-agricultural vegetation satellite data, this paper proposes an enhanced random forest algorithm based on the C5.0 algorithm. The paper focuses on the Liaohe Plain, selecting two distinct non-agricultural landscape patterns in Shenbei New District and Changtu County as research objects. High-resolution satellite data from GF-2 serves as the experimental dataset. This paper introduces an ensemble feature method based on the bagging concept to improve the original random forest classification model. This method enhances the likelihood of selecting features beneficial to classifying positive class samples, avoiding excessive removal of useful features from negative samples. This approach ensures feature importance and model diversity. The C5.0 algorithm is then employed for feature selection, and the enhanced vegetation index (EVI) is utilized for vegetation coverage estimation. Results indicate that employing a multi-scale parameter selection tool, combined with limited field-measured data, facilitates the identification and classification of plant species in forest landscapes. The C5.0 algorithm effectively selects classification features, minimizing information redundancy. The established object-oriented random forest classification model achieves an impressive accuracy of 94.02% on the aerial imagery for forest classification dataset, with EVI-based vegetation coverage estimation demonstrating high accuracy. In experiments on the same test set, the proposed algorithm attains an average accuracy of 90.20%, outperforming common model algorithms such as bidirectional encoder representation from transformer, FastText, and convolutional neural network, which achieve average accuracies ranging from 84.41 to 88.33% in identifying non-agricultural artificial habitat vegetation features. The proposed algorithm exhibits a competitive edge compared to other algorithms. These research findings contribute scientific evidence for protecting agricultural ecosystems and restoring agricultural ecosystem biodiversity.

Keywords