Journal of Big Data (Aug 2024)

Tc-llama 2: fine-tuning LLM for technology and commercialization applications

  • Jeyoon Yeom,
  • Hakyung Lee,
  • Hoyoon Byun,
  • Yewon Kim,
  • Jeongeun Byun,
  • Yunjeong Choi,
  • Sungjin Kim,
  • Kyungwoo Song

DOI
https://doi.org/10.1186/s40537-024-00963-0
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 31

Abstract

Read online

Abstract This paper introduces TC-Llama 2, a novel application of large language models (LLMs) in the technology-commercialization field. Traditional methods in this field, reliant on statistical learning and expert knowledge, often face challenges in processing the complex and diverse nature of technology-commercialization data. TC-Llama 2 addresses these limitations by utilizing the advanced generalization capabilities of LLMs, specifically adapting them to this intricate domain. Our model, based on the open-source LLM framework, Llama 2, is customized through instruction tuning using bilingual Korean-English datasets. Our approach involves transforming technology-commercialization data into formats compatible with LLMs, enabling the model to learn detailed technological knowledge and product hierarchies effectively. We introduce a unique model evaluation strategy, leveraging new matching and generation tasks to verify the alignment of the technology-commercialization relationship in TC-Llama 2. Our results, derived from refining task-specific instructions for inference, provide valuable insights into customizing language models for specific sectors, potentially leading to new applications in technology categorization, utilization, and predictive product development.

Keywords