Journal of Traffic and Transportation Engineering (English ed. Online) (Apr 2023)
Automated classification and detection of multiple pavement distress images based on deep learning
Abstract
To achieve automatic, fast, efficient and high-precision pavement distress classification and detection, road surface distress image classification and detection models based on deep learning are trained. First, a pavement distress image dataset is built, including 9017 pictures with distress, and 9620 pictures without distress. These pictures were captured from 4 asphalt highways of 3 provinces in China. In each pavement distress image, there exists one or more types of distress, including alligator crack, longitudinal crack, block crack, transverse crack, pothole and patch. The distresses are labeled by a rectangle bounding box on the pictures. Then ResNet networks and VGG networks are used respectively as binary classification models for distressed and non-distressed imagines classification, and as multi-label classification models for six types of distress classification. Training techniques, such as data augmentation, batch normalization, dropout, momentum, weight decay, transfer learning, and discriminative learning rate are used in training the model. Among the 4 CNNs considered in this study, namely ResNet 34 and 50, and VGG 16 and 19, for the binary classification, ResNet 50 has the highest Accuracy of 96.243%, Precision of 95.183%, and ResNet 34 has the highest Recall of 97.824%, and F2 score of 97.052%. For multi-label classification, ResNet 50 has the best performance, with the highest Accuracy of 90.257%, higher than 90% required by the Chinese standard (JTG H20-2018) for road distresses detection, F2 score −82.231%, and Precision −76.509%, and ResNet 34 has the highest Recall of 87.32%. To locate and quantify the distress areas in the images, the single shot multibox detector (SSD) model is developed, in which the ResNet 50 is used as the base network to extract features. When the intersection over union (IoU) is set to 0, 0.25, 0.50, 0.75, the mean average precision (mAP) of the model are found to be 74.881%, 50.511%, 28.432%, 3.969%, respectively.