IEEE Access (Jan 2020)
An Efficient Access Model of Massive Spatiotemporal Vehicle Trajectory Data in Smart City
Abstract
Daily trajectory data scale of vehicle monitoring networks in smart cities is growing rapidly, reaching daily volumes of 1 billion. Accessing hyper massive spatiotemporal trajectory data (HMSTD) in transport, the Internet of Things, or other fields is difficult and limited based on the current spatiotemporal data index techniques. Therefore, we propose path-divided Hadoop Distributed File System (HDFS) data blocking (PDDB) based on the Apache Impala (PDDB-Impala) method to optimize the efficient access manner of HMSTD to enhance the efficiency of hyper data sharing. Moreover, PDDB parquet data partitioning rules are proposed. In experiments, 35,809 buses equipped with BD positioning sensors, creating 1.03 billion data records each day. The bus distribution in Shenzhen city is collected from 7:00 a.m. to 9:00 a.m. and 11:00 a.m. to 01:00 p.m. Moreover, PDDB-Impala achieves about 8 times, 9 times, 29 times, and 110 times higher performances than those in MongoDB or HBase for data scales of 1 billion, 10 billion, 50 billion, and 100 billion, the results of which outperform those of the equipartition in the Impala, MongoDB, and HBase methods.
Keywords