Advanced methods for knowledge injection in large language models

Nikita I. Kulin; Sergey B. Muravyov

doi:10.17586/2226-1494-2024-24-4-588-593

Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki (Aug 2024)

Advanced methods for knowledge injection in large language models

Nikita I. Kulin,
Sergey B. Muravyov

Affiliations

Nikita I. Kulin: ORCiD; PhD Student, ITMO University, Saint Petersburg, 197101, Russian Federation, sc 57222386134
Sergey B. Muravyov: ORCiD; PhD, Associate Professor, ITMO University, Saint Petersburg, 197101, Russian Federation, sc 57194035005

DOI: https://doi.org/10.17586/2226-1494-2024-24-4-588-593
Journal volume & issue: Vol. 24, no. 4
pp. 588 – 593

Abstract

Read online

Transformer-based language models have revolutionized Natural Language Processing tasks, with advancements in language modeling techniques. Current transformer architectures utilize attention mechanisms to model text dependencies effectively. Studies have shown that these models embed syntactic structures and knowledge, explaining their performance in tasks involving syntactic and semantic elements. However, transformer-based models are prone to hallucination where incorporated knowledge is not utilized effectively. To address this, methods are emerging to mitigate hallucination and integrate external knowledge sources like knowledge graphs (e.g., Freebase, WordNet, ConceptNet, ATOMIC). Knowledge graphs represent real-world knowledge through entities and relationships offering a potential injection point to enhance model performance in inference tasks. Various injection approaches, including input, architectural, and output injections, aim to incorporate knowledge from graphs into transformer models. Input injections modify data preprocessing, architectural injections add layers for knowledge integration, and output injections adjust error functions to correct knowledge incorporation during training. Despite ongoing research, a universal solution to hallucination remains elusive, and a standardized benchmark for comparing injection methods is lacking. This study investigates knowledge graphs as one of the methods to mitigate hallucination and their possible integration into Large Language Models. Comparative experiments across General Language Understanding Evaluation benchmark tasks demonstrated that ERNIE 3.0 and XLNet outperform other injection methods with the average scores of 91.1 % and 90.1 %.

Published in Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki

ISSN: 2226-1494 (Print); 2500-0373 (Online)
Publisher: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
Country of publisher: Russian Federation
LCC subjects: Science: Physics: Optics. Light; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://ntv.ifmo.ru/en/english.htm

About the journal

Abstract

Keywords