应用气象学报 (Jan 2021)
An Objective Hailstorm Labeling Algorithm Based on Ground Observation
Abstract
Data labeling makes the key foundation of building data sets for deep learning, especially in the intelligent forecasts of severe weather, such as hail, the observations of which are lacking. Disaster report is a kind of information that describes the details of meteorological disasters which is collected by meteorological information officer. Due to the high coverage rate of informants throughout villages and communities, disaster report is considered to have good consistency and high spatial resolution. However, the vague description of disaster occurrence time in disaster report limits its application. To solve this problem, 13 hail cases(divided into reference set and verification set) with accurate occurrence time in hail reports in Chongqing during 2008-2019 are selected, and an objective hailstorm labeling algorithm based on actual hail observations is developed using fuzzy logic algorithm. In order to obtain a reasonable match between hail occurrence location and convective storm, the distance between the centroid of the storm and initial guess location of hail occurrence, the maximum values of reflectivity, height of 45 dBZ reflectivity, vertical integral liquid water content and echo top are selected as discriminant factors, and the storm is identified by the storm cell identification and tracking (SCIT) algorithm. In reference set, 7 hail cases can be labeled correctly and only 1 case is failed to identify storms. The time bias between the labeling time and ground disaster report is less than 6 minutes during 5 cases. Inspected by verification set (5 cases in 2019), the algorithm labeling accuracy is 100% and the matching degree ranges from 0.887 to 1.000. Furthermore, the algorithm is applied to 22 hailstorm labeling cases lacking accurate time, and the results are compared with the manual labeling results by forecasters. Subjective and objective methods tend to identify the same storm cell and have little impact on data set construction. Forecasters tend to label the same storm cell 6-12 minutes ahead. Further analysis shows that the size of hail has no significant effects on the labeling result. The algorithm is not sensitive to the occurrence time of hail disaster, and it can give reliable labeling results for both long time living storm and local hail disaster. However, when the identification algorithm fails to figure out storms, or the initial guess location deviation is large, it will have a significant negative impact on the labeling results.
Keywords