Kongzhi Yu Xinxi Jishu (Jun 2024)
Specialized Large Language Model for Standardization of Locomotive Maintenance Data
Abstract
Standardization is one of the key steps to analyze locomotive overhaul data with a focus on reliability-centered maintenance (RCM). However, traditional manual methods encounter challenges such as small sample sizes, non-standardized data formats, analytical complexities, and high labour costs, hindering the achievement of data standardization. Large language models (LLM), featuring powerful performance in natural language processing comprehension and handling complex tasks, have made great academic and industrial progress in recent years. This study initially investigated the application performance of LLMs in information extraction from locomotive overhaul data, with the following three reveals, as the universal information extraction (UIE) LLM is suitable for information extraction in the field of locomotive overhaul; expanding the size of locomotive data helps improve the UIE performance in information extraction from locomotive overhaul data; balancing the types of fault labels does not notably help improve this performance. Subsequent explorations concentrated on difficulties in data annotation. The script writing method was utilized for automated annotation of data, and ChatGLM was leveraged to standardize locomotive overhaul data, yielding Bleu-4, Rouge-1, Rouge-2, and Rouge-L metrics of 86.87%, 89.60%, 87.54%, and 94.26%, respectively, in alignment with the requirements of engineering applications. Further developments introduced an auxiliary data standardization pre-processing tool to streamline the standardization process by encapsulating the LLM.
Keywords