Annotation method of risk data in a certain field based on pattern matching

Geng Weibo; Zhao Yingxiao; Xu Ping; Cai Jiaoyang; Fang Fang

doi:10.1051/e3sconf/202452201046

E3S Web of Conferences (Jan 2024)

Annotation method of risk data in a certain field based on pattern matching

Geng Weibo,
Zhao Yingxiao,
Xu Ping,
Cai Jiaoyang,
Fang Fang

Affiliations

Geng Weibo: Military Science Information Research Center, Academy of Military Sciences
Zhao Yingxiao: Military Science Information Research Center, Academy of Military Sciences
Xu Ping: Military Science Information Research Center, Academy of Military Sciences
Cai Jiaoyang: Military Science Information Research Center, Academy of Military Sciences
Fang Fang: Military Science Information Research Center, Academy of Military Sciences

DOI: https://doi.org/10.1051/e3sconf/202452201046
Journal volume & issue: Vol. 522
p. 01046

Abstract

Read online

With the development of information technology and the increasing complexity of industrial technology, there is an urgent need for a certain field to use big data and artificial intelligence to improve the management and decision-making level. In order to classify the field’s risk text data through intelligent algorithms, analysing the risk distribution and the major problems, this paper researches on the annotation methods of training data in this field. The proposed data annotation method is based on pattern matching, addressing the special problems of risk data annotation in this field (such as strong professionalism, small data volume, high accuracy requirement and timeliness requirements). A new matching pattern is generated through the steps of text segmentation, keyword extraction, pattern preliminary generation, pattern relation tree construction, pattern optimization, pattern generalization, pattern verification, classification and annotation, and final classification and annotation are performed after pattern matching. Performance tests in terms of accuracy, recall rate, and annotation time have shown that the overall performance of the proposed method outperforms that of traditional item-by-item manual annotation, and semi-automatic annotation methods through machine learning. The method described in this paper has strong application value for risk data annotation in this field, and also has certain reference significance for high-density, high-accuracy and high-timeliness data annotation in other fields.

Published in E3S Web of Conferences

ISSN: 2267-1242 (Online)
Publisher: EDP Sciences
Country of publisher: France
LCC subjects: Geography. Anthropology. Recreation: Environmental sciences
Website: http://www.e3s-conferences.org/

About the journal