From Large Language Models to Large Multimodal Models: A Literature Review

Dawei Huang; Chuan Yan; Qing Li; Xiaojiang Peng

doi:10.3390/app14125068

Applied Sciences (Jun 2024)

From Large Language Models to Large Multimodal Models: A Literature Review

Dawei Huang,
Chuan Yan,
Qing Li,
Xiaojiang Peng

Affiliations

Dawei Huang: College of Applied Science, Shenzhen University, Shenzhen 518052, China
Chuan Yan: Department of Computer Science, George Mason University, Fairfax, VA 22030, USA
Qing Li: College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518118, China
Xiaojiang Peng: College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518118, China

DOI: https://doi.org/10.3390/app14125068
Journal volume & issue: Vol. 14, no. 12
p. 5068

Abstract

Read online

With the deepening of research on Large Language Models (LLMs), significant progress has been made in recent years on the development of Large Multimodal Models (LMMs), which are gradually moving toward Artificial General Intelligence. This paper aims to summarize the recent progress from LLMs to LMMs in a comprehensive and unified way. First, we start with LLMs and outline various conceptual frameworks and key techniques. Then, we focus on the architectural components, training strategies, fine-tuning guidance, and prompt engineering of LMMs, and present a taxonomy of the latest vision–language LMMs. Finally, we provide a summary of both LLMs and LMMs from a unified perspective, make an analysis of the development status of large-scale models in the view of globalization, and offer potential research directions for large-scale models.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords