Systems (Aug 2024)

Taxi Travel Distance Clustering Method Based on Exponential Fitting and <i>k</i>-Means Using Data from the US and China

  • Zhenang Song,
  • Jun Cai,
  • Qiyao Yang

DOI
https://doi.org/10.3390/systems12080282
Journal volume & issue
Vol. 12, no. 8
p. 282

Abstract

Read online

The taxi travel distance distribution can be used to forecast the origin and destination (OD) distribution of taxis and private cars. Most of the existing studies on taxi trip distributions have summarized a “low–high–low” trend and approached zero at both ends; however, they failed to explain the reason for this distance distribution. The key indicators and parameters identified by various researchers using big data for the same city and year typically differ, especially in terms of the mode and mean values of distance and time. This study uses New York yellow and green taxi data (a total of 417,018,811 data points) from 2017 to 2022, as well as data from China, to obtain a general law of the taxi travel distance distribution through an analysis of the relative distance and relative frequency. The travel mode was 0.54 times the relative distance, while the data tended towards zero at 2.0 times the relative distance. We verified the reliability of the research method based on reference and survey data. The results reveal the formation mechanism of the taxi travel distance distribution characteristics, which follow an exponential distribution. These laws can be used in the context of urban planning and transportation research. We propose a taxi form distance clustering method based on the k-means approach, chosen for its effectiveness on large datasets, interpretability, and alignment with our research objectives. This method provides visual results for the travel distance and accurate information for urban transportation planning and taxi services. The practical implications for policymakers, urban planners, and taxi services are discussed, demonstrating how the identified travel distance distribution laws can influence urban planning and taxi service optimization. Finally, the problems of data collection, cleaning, and processing are identified from the perspective of data statistics and analysis.

Keywords