Specialized Large Language Model for Standardization of Locomotive Maintenance Data

CHEN Ao; LI Chen; YAN Jiayun; PENG Liantie; TIAN Ye; LIU Leixinyuan

doi:10.13889/j.issn.2096-5427.2024.03.200

Kongzhi Yu Xinxi Jishu (Jun 2024)

Specialized Large Language Model for Standardization of Locomotive Maintenance Data

CHEN Ao,
LI Chen,
YAN Jiayun,
PENG Liantie,
TIAN Ye,
LIU Leixinyuan

Affiliations

CHEN Ao: Zhuzhou CRRC Time Electric Co., Ltd., Zhuzhou, Hunan 412001, China
LI Chen: Zhuzhou CRRC Time Electric Co., Ltd., Zhuzhou, Hunan 412001, China
YAN Jiayun: Zhuzhou CRRC Time Electric Co., Ltd., Zhuzhou, Hunan 412001, China
PENG Liantie: Zhuzhou CRRC Time Electric Co., Ltd., Zhuzhou, Hunan 412001, China
TIAN Ye: Zhuzhou CRRC Time Electric Co., Ltd., Zhuzhou, Hunan 412001, China
LIU Leixinyuan: Zhuzhou CRRC Time Electric Co., Ltd., Zhuzhou, Hunan 412001, China

DOI: https://doi.org/10.13889/j.issn.2096-5427.2024.03.200
Journal volume & issue: no. 3
pp. 72 – 79

Abstract

Read online

Standardization is one of the key steps to analyze locomotive overhaul data with a focus on reliability-centered maintenance (RCM). However, traditional manual methods encounter challenges such as small sample sizes, non-standardized data formats, analytical complexities, and high labour costs, hindering the achievement of data standardization. Large language models (LLM), featuring powerful performance in natural language processing comprehension and handling complex tasks, have made great academic and industrial progress in recent years. This study initially investigated the application performance of LLMs in information extraction from locomotive overhaul data, with the following three reveals, as the universal information extraction (UIE) LLM is suitable for information extraction in the field of locomotive overhaul; expanding the size of locomotive data helps improve the UIE performance in information extraction from locomotive overhaul data; balancing the types of fault labels does not notably help improve this performance. Subsequent explorations concentrated on difficulties in data annotation. The script writing method was utilized for automated annotation of data, and ChatGLM was leveraged to standardize locomotive overhaul data, yielding Bleu-4, Rouge-1, Rouge-2, and Rouge-L metrics of 86.87%, 89.60%, 87.54%, and 94.26%, respectively, in alignment with the requirements of engineering applications. Further developments introduced an auxiliary data standardization pre-processing tool to streamline the standardization process by encapsulating the LLM.

Published in Kongzhi Yu Xinxi Jishu

ISSN: 2096-5427 (Print)
Publisher: Editorial Office of Control and Information Technology
Country of publisher: China
LCC subjects: Technology: Mechanical engineering and machinery: Control engineering systems. Automatic machinery (General)
Website: https://ctet.csrzic.com/en/#/

About the journal

Abstract

Keywords