Journal of Advanced Transportation (Jan 2018)

Identifying Public Transit Commuters Based on Both the Smartcard Data and Survey Data: A Case Study in Xiamen, China

  • Shichao Sun,
  • Dongyuan Yang

DOI
https://doi.org/10.1155/2018/9693272
Journal volume & issue
Vol. 2018

Abstract

Read online

Understanding the travel patterns of public transit commuters was important to the efforts towards improving the service quality, promoting public transit use, and better planning the public transit system. Smartcard data, with its wide coverage and relative abundance, could provide new opportunities to study public transit riders’ behaviors and travel patterns with much less cost than conventional data source. However, the major limitation of smartcard data is the absence of social attributes of the cardholders, so that it cannot clearly extract public transit commuters and explain the mechanism of their travel behaviors. This study employed a machine learning approach called Naive Bayesian Classifier (NBC) to identify public transit commuters based on both the smartcard data and survey data, demonstrated in Xiamen, China. Compared with existing methods which were plagued by the validation of the accuracy of the identification results, the adopted approach was a machine learning algorithm with functions of accuracy checking. The classifier was trained and tested by survey data obtained from 532 valid questionnaires. The accuracy rate for identification of public transit commuters was 92% in the test instances. Then, under a low calculation load, it identified the objectives in smartcard data without requiring travel regularity assumptions of public transit commuters. Nearly 290,000 cardholders were classified as public transit commuters. Statistics such as average first boarding time and travel frequency of workdays during peak hours were obtained. Finally, the smartcard data were fused with bus location data to reveal the spatial distributions of the home and work locations of these public transit commuters, which could be utilized to improve public transit planning and operations.