Jisuanji kexue yu tansuo (Oct 2024)
Knowledge Augmentation on Traditional Chinese Medicine Language Model
Abstract
Recently, large language models (LLM) have made significant achievements in various fields. However, due to lack of specialized knowledge and the gap between modern medicine and traditional Chinese medicine (TCM), it is still a challenge to deploy LLM in TCM. Existing methods fail to maintain the structure of TCM prescription. To address the problems, a pattern of knowledge augmentation is proposed. The method includes model training, knowledge graph construction and knowledge augmentation. In the training phase, TCM language model is trained on TCM corpus, by a two-stage method combining pre-training and fine-tuning. In the knowledge graph construction phase, prescription knowledge graph is constructed from nearly 100000 preprocessed classical TCM prescriptions and those from ancient books. In the knowledge augmentation phase, enhanced by the above pattern, outputs are generated from computation of knowledge graph, according to the schema of knowledge graph from searching result, which preserves the structure of prescriptions. A set of evaluations specific to prescription optimizations is proposed, including objective and subjective indicators, to evaluate the performance of the model for the task. Experiment shows that the model improves greatly on both subjective and objective evaluations compared with baselines. BLEU-1 is increased by up to 0.09, while ROUGE-1 is increased by up to 0.21. Ablation study shows that, it is of vital importance for the model performance to be knowledge-augmented. BLEU-1 of augmentation-free model is decreased by about 37% compared with that of the augmented model.
Keywords