Journal of Engineering Science (Chişinău) (Jul 2024)

RETRIEVAL-AUGMENTED GENERATION USING DOMAIN-SPECIFIC TEXT: A PILOT STUDY

  • IAPĂSCURTĂ, Victor,
  • KRONIN, Sergey,
  • FIODOROV, Ion

DOI
https://doi.org/10.52326/jes.utm.2024.31(2).05
Journal volume & issue
Vol. XXXI, no. 2
pp. 48 – 59

Abstract

Read online

The natural language processing (NLP) field has witnessed remarkable advancements with the advent of large language models (LLMs) like GPT, Gemini, Claude, etc. These models are trained on vast amounts of text data, allowing them to generate human-like responses for various tasks. However, despite their impressive capabilities, LLMs have limitations in their ability to incorporate and reason over external knowledge that is not in their training data. This limitation of LLMs is particularly evident in the case of specific domain knowledge. This situation has given rise to the concept of retrieval augmented generation (RAG), an approach that combines the generative power of LLMs with the ability to retrieve and integrate relevant information from external knowledge sources. This research attempts to use RAG as a module in an application designed to answer questions concerning a specific domain, namely social philosophy/philosophy of management, using a published book from the respective domain as an external source. The paper analyzes the mentioned application output, draws conclusions, and traces future directions to improve the accuracy of the output.

Keywords