Remote Sensing (Aug 2024)
Enhancing Extreme Precipitation Forecasts through Machine Learning Quality Control of Precipitable Water Data from Satellite FengYun-2E: A Comparative Study of Minimum Covariance Determinant and Isolation Forest Methods
Abstract
Variational data assimilation theoretically assumes Gaussian-distributed observational errors, yet actual data often deviate from this assumption. Traditional quality control methods have limitations when dealing with nonlinear and non-Gaussian-distributed data. To address this issue, our study innovatively applies two advanced machine learning (ML)-based quality control (QC) methods, Minimum Covariance Determinant (MCD) and Isolation Forest, to process precipitable water (PW) data derived from satellite FengYun-2E (FY2E). We assimilated the ML QC-processed TPW data using the Gridpoint Statistical Interpolation (GSI) system and evaluated its impact on heavy precipitation forecasts with the Weather Research and Forecasting (WRF) v4.2 model. Both methods notably enhanced data quality, leading to more Gaussian-like distributions and marked improvements in the model’s simulation of precipitation intensity, spatial distribution, and large-scale circulation structures. During key precipitation phases, the Fraction Skill Score (FSS) for moderate to heavy rainfall generally increased to above 0.4. Quantitative analysis showed that both methods substantially reduced Root Mean Square Error (RMSE) and bias in precipitation forecasting, with the MCD method achieving RMSE reductions of up to 58% in early forecast hours. Notably, the MCD method improved forecasts of heavy and extremely heavy rainfall, whereas the Isolation Forest method demonstrated a superior performance in predicting moderate to heavy rainfall intensities. This research not only provides a basis for method selection in forecasting various precipitation intensities but also offers an innovative solution for enhancing the accuracy of extreme weather event predictions.
Keywords