Jisuanji kexue yu tansuo (Nov 2024)
Construction Method of Textbook Knowledge Graph Based on Multimodal and Knowledge Distillation
Abstract
In order to efficiently construct a multimodal subject knowledge graph in the field of education, a textbook text entity relationship extraction algorithm based on large model knowledge distillation and multi-model collaborative reasoning is proposed. During the model training phase, this paper uses a closed source model with 100 billion parameters to annotate text data and achieve implicit knowledge distillation. Then, this paper fine-tunes the domain data instructions for the open-source billion scale parameter model to enhance the instruction compliance ability of the entity relationship extraction task of the open-source model. In the model inference stage, the closed source model serves as the guiding model, and the open-source billion scale parameter model serves as the execution model. Experimental results show that knowledge distillation, multi-model collaboration, and domain data instruction fine-tuning are effective, significantly improving the effectiveness of textbook text entity relationship extraction tasks based on instruction prompts. A multimodal named entity recognition algorithm for textbook diagrams with explicit and implicit knowledge enhancement has been proposed. Firstly, this paper uses techniques such as image OCR (optical character recognition) and visual language modeling to extract textual information and global content description information from textbook diagrams. Then, by using explicit knowledge base retrieval and implicit LLM hint enhancement methods, auxiliary knowledge that may be associated with image title pairs is obtained. The knowledge obtained from explicit knowledge base and implicit LLM is further fused to form the final auxiliary knowledge. Finally, the auxiliary knowledge of the schematic diagram is combined with the schematic diagram title to achieve multimodal named entity recognition of the textbook schematic diagram title. Experimental results show that the algorithm is advanced and the interpretability of the algorithm is enhanced.
Keywords