IET Intelligent Transport Systems (Oct 2024)
Data‐driven train delay prediction incorporating dispatching commands: An XGBoost‐metaheuristic framework
Abstract
Abstract Train delays can significantly impact the punctuality and service quality of high‐speed trains, which also play a crucial role in affecting dispatchers with their decision‐making. In this study, a data‐driven train delay prediction framework was proposed and strengthened by considering the impact of dispatching commands and the mechanisms of train delay propagation using XGBoost. Four metaheuristic algorithms were utilized to fine‐tune its hyperparameters. A vast dataset comprising 1.9 million records spanning 38 months of train operation data was utilized for feature extraction and model training. The model's accuracy was evaluated using three statistical metrics, and a comparison of the four tuning frameworks was performed. To emphasize the model's interpretability and its practical guidance for train rescheduling, the relationship of dispatching commands, delay propagation and delay prediction was validated by combining the theory and practical results, and a SHAP (SHapley Additive exPlanations) analysis was used for a clearer model explanation. The results revealed that distinct XGBoost‐Metaheuristic models exhibit unique effects in different criteria, yet they all demonstrated high accuracy and low prediction errors, thereby revealing the potential of using machine learning for train delay prediction, which is valuable for decision‐making and rescheduling.
Keywords