Frontiers in Environmental Science (Sep 2022)

RSFD: A rough set-based feature discretization method for meteorological data

  • Lirong Zeng,
  • Qiong Chen,
  • Mengxing Huang

DOI
https://doi.org/10.3389/fenvs.2022.1013811
Journal volume & issue
Vol. 10

Abstract

Read online

Meteorological data mining aims to discover hidden patterns in a large number of available meteorological data. As one of the most relevant big data preprocessing technologies, feature discretization can transform continuous features into discrete ones to improve the efficiency of meteorological data mining algorithms. Aiming at the problems of high interaction of multiple attributes, noise interference, and difficulty in obtaining prior knowledge in meteorological data, we propose a rough set-based feature discretization method for meteorological data (RSFD). First, we calculate the information gain of each candidate breakpoint in the meteorological attribute to split the intervals. Then, we use chi-square test to merge these discrete intervals. Finally, we take the variation of indiscernibility relation in rough set as the evaluation criterion for the discretization scheme. We scan each attribute in turn by using the strategy of splitting first and then merging, thus obtaining the optimal discrete feature set. We compare RSFD with the state-of-the-art discretization methods on meteorological data. Experiments show that our method achieves better results in the classification accuracy of meteorological data, and obtains a smaller number of discrete intervals while ensuring data consistency.

Keywords