EPJ Data Science (Jun 2025)
Designing transit routes based on vehicle routing behavior determined through location-based services data
Abstract
Abstract The disparity between transit agency travel predictions and the unpredictable nature of real-world travel behavior contributes to inefficiencies within the transit system. To address this challenge, we propose a bottom-up transit planning approach that leverages extensive Location-Based Services (LBS) data and General Transit Feed Specification (GTFS) data for Dallas, Texas. The LBS dataset used in this study is comprised of approximately 12.43 billion records from 6.5 million users. This rich dataset is combined with GTFS data to analyze vehicle routing behavior and identify transit supply gaps. Hidden Markov Model (HMM)-based map matching aligns the LBS trajectories with a road network extracted from OpenStreetMap, allowing us to compare user demand against bus service frequency based on GTFS. To design transit improvements, we first apply k-means clustering based on Euclidean distances to group underserved road segments, and then refine these groups using a shortest-path-based clustering algorithm. This second step explicitly incorporates the actual connectivity of the road network, ensuring that proposed transit routes follow realistic travel paths. Our evaluation indicates that the proposed transit routes, whether via route extensions or new bus lines, can substantially serve the underserved areas and have the potential to significantly reduce Vehicle Miles Traveled (VMT).
Keywords