Applied Sciences (Jun 2024)

From Large Language Models to Large Multimodal Models: A Literature Review

  • Dawei Huang,
  • Chuan Yan,
  • Qing Li,
  • Xiaojiang Peng

DOI
https://doi.org/10.3390/app14125068
Journal volume & issue
Vol. 14, no. 12
p. 5068

Abstract

Read online

With the deepening of research on Large Language Models (LLMs), significant progress has been made in recent years on the development of Large Multimodal Models (LMMs), which are gradually moving toward Artificial General Intelligence. This paper aims to summarize the recent progress from LLMs to LMMs in a comprehensive and unified way. First, we start with LLMs and outline various conceptual frameworks and key techniques. Then, we focus on the architectural components, training strategies, fine-tuning guidance, and prompt engineering of LMMs, and present a taxonomy of the latest vision–language LMMs. Finally, we provide a summary of both LLMs and LMMs from a unified perspective, make an analysis of the development status of large-scale models in the view of globalization, and offer potential research directions for large-scale models.

Keywords