International Journal of Digital Earth (Dec 2024)
Multi-class multi-label classification of social media texts for typhoon damage assessment: a two-stage model fully integrating the outputs of the hidden layers of BERT
Abstract
ABSTRACTWith the development of social media, it has become increasingly important to quickly and accurately identify social media texts related to disasters (e.g. typhoon) to aid in rescue and recovery efforts. Currently, multi-class classification and pre-trained language model Bidirectional Encoder Representations from Transformers (BERT) are widely used for text classification. However, most studies on typhoon damage classification are multi-class single-label, which contradicts to the reality that a social media text may correspond to multiple types of damage. Moreover, the outputs of the hidden layers of BERT are not fully utilized. This paper proposes a two-stage multi-class multi-label classification method for typhoon damage assessment by fully integrating the outputs of the hidden layers of BERT. In the first stage, sentence vectors are adopted to identify typhoon damage-related texts. In the second stage, word matrices are applied for multi-class multi-label classification to further classify the texts into five damage categories (i.e. transportation, public, electricity, forestry, and waterlogging). The two stages are trained end-to-end to identify typhoon damage from social media texts. Experiments on [Formula: see text] texts during typhoon landfall in Chinese coastal regions demonstrate that the proposed method can effectively improve the accuracy of text classification and comprehensively assess typhoon damage.
Keywords