应用气象学报 (Sep 2021)
Development of Basic Dataset of Severe Convective Weather for Artificial Intelligence Training
Abstract
Deep learning shows great potential in severe convective weather nowcasting. The establishment of deep learning model is inseparable from a large number of training and learning, which is in terms of large capacity and high-quality dataset. Based on multi-source observations of CMA(China Meteorological Administration), disaster reports and internet media information, a dataset of severe convective weather for artificial intelligence training (SCWDS) is established. SCWDS is organized by severe convective weather events. It includes 184865 cases and each case is composed of several samples in the spatiotemporal window of the event. There are 9256405 samples including thunderstorm, gale, short-term heavy rain, hail and tornado in China from 2012 to 2019 in SCWDS. Each sample includes severe weather event annotation and corresponding spatiotemporal window of surface observations of temperature, precipitation, pressure, humidity, winds (average wind speed and maximum wind speed), radiosonde observations of temperature, dew point temperature, geopotential height and winds from 1000 to 1 hPa, lightning observations of intensity, radar volume scan data, visible, long wave infrared, water vapor and mid infrared channels of FY-2E, FY-2G and FY-2D nominal disk data, and environmental factors of ERA5 reanalysis data. Quality control and data cleaning are carried out, and all cases of time discontinuity, wrong logical relationship or caused by non-convective factors are eliminated. It shows that the thunderstorm, the short-term heavy rain and the hail mainly occur from April to September, especially from June to August in summer. However, the thunderstorm and the gale occur most frequently from April to May. The tornado occurs frequently from June to August and April. The thunderstorm, the gale and the hail show the same diurnal variation, and the high frequency period is concentrated between afternoon and evening. The daily cycle of the occurrence frequency of the short-term heavy rain presents a bimodal feature, and the high value period is in 0300-0400 BT and 1500-1600 BT. The occurrence of severe convective weather presents large spatial variability. The thunderstorm mainly distributes in South China, Jiangnan, the Tibet Plateau and the Yunnan-Guizhou Plateau where the frequency generally exceeds 40 times. The gale mainly distributes in the northern part of North China and Xinjiang, coastal areas in the south of the Yangtze with frequency of more than 10 times. The short-time heavy rain is mainly concentrated in southwest, South China, Jiangnan and Huanghuai Regions with frequency of more than 100 times. The hail is mainly distributed in the Tibet Plateau, the Yunnan-Guizhou Plateau and the northern part of North China where the frequency generally exceeds 6 times. The tornado mainly distributes in Jiangsu, Guangdong and Qiongzhou Straits.
Keywords