Taxi Passenger Hot Spot Mining Based on a Refined K-Means&#x002B;&#x002B; Algorithm

Yuanni Wang; Jiansi Ren

doi:10.1109/ACCESS.2021.3075682

IEEE Access (Jan 2021)

Taxi Passenger Hot Spot Mining Based on a Refined K-Means++ Algorithm

Yuanni Wang,
Jiansi Ren

Affiliations

Yuanni Wang: ORCiD; School of Computer Science, China University of Geosciences, Wuhan, China
Jiansi Ren: School of Computer Science, China University of Geosciences, Wuhan, China

DOI: https://doi.org/10.1109/ACCESS.2021.3075682
Journal volume & issue: Vol. 9
pp. 66587 – 66598

Abstract

Read online

With the development of information technology, it is possible to explore the spatial-temporal distribution characteristics of taxi travel demand by examining taxi GPS location data in order to master the actual supply and demand levels of different hot spots at different time periods. At present, in hot spot mining, the existing research on the clustering of passenger hot spots has some performance problems, such as insufficient clustering accuracy and high algorithm time complexity. The purpose of this paper is to propose a two-level subdivision concept and improve the K-means++ algorithm to finish the fine clustering of hot spots of taxi passengers. The first-level subdivision establishes a dynamic adjustable region with time and geographical range. In the second layer, a Gaussian mixture model is used for the data distribution statistics, and the optimal subdivision area number is determined according to the minimum principle of the Akaike information criterion and Bayesian information criterion. The SSE (sum of the square distance errors) is used to determine the optimal cluster number $k$ for each local area. Finally, the K-means++ algorithm is used to complete the clustering of each local area. A week of green taxi data from New York City was used to validate the method and compare it to the traditional K-means and DBSCAN approaches. The proposed method achieved better accuracy with comparable time consumption. This demonstrated the value of the approach for hot spot data mining although clustering still has some important advantages. In addition, the hot spots in the morning peak and weekend are displayed visually, which is helpful to provide the guidance for urban transportation and planning.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords