RailTrack-DaViT: A Vision Transformer-Based Approach for Automated Railway Track Defect Detection

Aniwat Phaphuangwittayakul; Napat Harnpornchai; Fangli Ying; Jinming Zhang

doi:10.3390/jimaging10080192

Journal of Imaging (Aug 2024)

RailTrack-DaViT: A Vision Transformer-Based Approach for Automated Railway Track Defect Detection

Aniwat Phaphuangwittayakul,
Napat Harnpornchai,
Fangli Ying,
Jinming Zhang

Affiliations

Aniwat Phaphuangwittayakul: International College of Digital Innovation, Chiang Mai University, Chiang Mai 50200, Thailand
Napat Harnpornchai: Faculty of Economics, Chiang Mai University, Chiang Mai 50200, Thailand
Fangli Ying: State Key Laboratory of Bioreactor Engineering, Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
Jinming Zhang: International College of Digital Innovation, Chiang Mai University, Chiang Mai 50200, Thailand

DOI: https://doi.org/10.3390/jimaging10080192
Journal volume & issue: Vol. 10, no. 8
p. 192

Abstract

Read online

Railway track defects pose significant safety risks and can lead to accidents, economic losses, and loss of life. Traditional manual inspection methods are either time-consuming, costly, or prone to human error. This paper proposes RailTrack-DaViT, a novel vision transformer-based approach for railway track defect classification. By leveraging the Dual Attention Vision Transformer (DaViT) architecture, RailTrack-DaViT effectively captures both global and local information, enabling accurate defect detection. The model is trained and evaluated on multiple datasets including rail, fastener and fishplate, multi-faults, and ThaiRailTrack. A comprehensive analysis of the model’s performance is provided including confusion matrices, training visualizations, and classification metrics. RailTrack-DaViT demonstrates superior performance compared to state-of-the-art CNN-based methods, achieving the highest accuracies: 96.9% on the rail dataset, 98.9% on the fastener and fishplate dataset, and 98.8% on the multi-faults dataset. Moreover, RailTrack-DaViT outperforms baselines on the ThaiRailTrack dataset with 99.2% accuracy, quickly adapts to unseen images, and shows better model stability during fine-tuning. This capability can significantly reduce time consumption when applying the model to novel datasets in practical applications.

Published in Journal of Imaging

ISSN: 2313-433X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Photography; Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/jimaging

About the journal

Abstract

Keywords