Enhancing Robot Task Planning and Execution through Multi-Layer Large Language Models

Zhirong Luan; Yujun Lai; Rundong Huang; Shuanghao Bai; Yuedi Zhang; Haoran Zhang; Qian Wang

doi:10.3390/s24051687

Sensors (Mar 2024)

Enhancing Robot Task Planning and Execution through Multi-Layer Large Language Models

Zhirong Luan,
Yujun Lai,
Rundong Huang,
Shuanghao Bai,
Yuedi Zhang,
Haoran Zhang,
Qian Wang

Affiliations

Zhirong Luan: School of Electrical Engineering, Xi’an University of Technology, Xi’an 710000, China
Yujun Lai: School of Electrical Engineering, Xi’an University of Technology, Xi’an 710000, China
Rundong Huang: School of Electrical Engineering, Xi’an University of Technology, Xi’an 710000, China
Shuanghao Bai: College of Artificial Intelligence, Xi’an Jiaotong University, Xi’an 710000, China
Yuedi Zhang: College of Artificial Intelligence, Xi’an Jiaotong University, Xi’an 710000, China
Haoran Zhang: College of Artificial Intelligence, Xi’an Jiaotong University, Xi’an 710000, China
Qian Wang: School of Electrical Engineering, Xi’an University of Technology, Xi’an 710000, China

DOI: https://doi.org/10.3390/s24051687
Journal volume & issue: Vol. 24, no. 5
p. 1687

Abstract

Read online

Large language models have found utility in the domain of robot task planning and task decomposition. Nevertheless, the direct application of these models for instructing robots in task execution is not without its challenges. Limitations arise in handling more intricate tasks, encountering difficulties in effective interaction with the environment, and facing constraints in the practical executability of machine control instructions directly generated by such models. In response to these challenges, this research advocates for the implementation of a multi-layer large language model to augment a robot’s proficiency in handling complex tasks. The proposed model facilitates a meticulous layer-by-layer decomposition of tasks through the integration of multiple large language models, with the overarching goal of enhancing the accuracy of task planning. Within the task decomposition process, a visual language model is introduced as a sensor for environment perception. The outcomes of this perception process are subsequently assimilated into the large language model, thereby amalgamating the task objectives with environmental information. This integration, in turn, results in the generation of robot motion planning tailored to the specific characteristics of the current environment. Furthermore, to enhance the executability of task planning outputs from the large language model, a semantic alignment method is introduced. This method aligns task planning descriptions with the functional requirements of robot motion, thereby refining the overall compatibility and coherence of the generated instructions. To validate the efficacy of the proposed approach, an experimental platform is established utilizing an intelligent unmanned vehicle. This platform serves as a means to empirically verify the proficiency of the multi-layer large language model in addressing the intricate challenges associated with both robot task planning and execution.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords