Agriculture (Mar 2024)
Research on Entity and Relationship Extraction with Small Training Samples for Cotton Pests and Diseases
Abstract
The extraction of entities and relationships is a crucial task in the field of natural language processing (NLP). However, existing models for this task often rely heavily on a substantial amount of labeled data, which not only consumes time and labor but also hinders the development of downstream tasks. Therefore, with a focus on enhancing the model’s ability to learn from small samples, this paper proposes an entity and relationship extraction method based on the Universal Information Extraction (UIE) model. The core of the approach is the design of a specialized prompt template and schema on cotton pests and diseases as one of the main inputs to the UIE, which, under its guided fine-tuning, enables the model to subdivide the entity and relationship in the corpus. As a result, the UIE-base model achieves an accuracy of 86.5% with only 40 labeled training samples, which really solves the problem of the existing models that require a large amount of manually labeled training data for knowledge extraction. To verify the generalization ability of the model in this paper, experiments are designed to compare the model with four classical models, such as the Bert-BiLSTM-CRF. The experimental results show that the F1 value on the self-built cotton data set is 1.4% higher than that of the Bert-BiLSTM-CRF model, and the F1 value on the public data set is 2.5% higher than that of the Bert-BiLSTM-CRF model. Furthermore, experiments are designed to verify that the UIE-base model has the best small-sample learning performance when the number of samples is 40. This paper provides an effective method for small-sample knowledge extraction.
Keywords