Measurement: Sensors (Jun 2023)
Clustering-based method for big spatial data partitioning
Abstract
“Internet of Things” (IoT) is considered one of the main focus areas of research in computer systems and networks. Since IoT devices are installed in static geographic places or on board a moving trackable object, the data generated by the device is mainly characterized as spatial data. The spatial data generated by IoT devices scale up in volume, velocity, and veracity so they tend to be considered “Big Data”. “Big Spatial Data” requires the development of special frameworks that use state-of-the-art technologies in data storage, query, and analysis. These frameworks are mainly characterized by the use of a parallel programming model that partitions the spatial data into smaller chunks that can be handled in parallel. The development of an optimal method for spatial data partitioning is essential in implementing such systems.In this paper, we propose, design, and implement a new method for spatial data partitioning based on K-Means clustering, an unsupervised machine learning algorithm. The design is based on a well-defined conceptual, mathematical, and programming model for a general spatial data partitioning method. The main component of the suggested model is an implementation of K-Means clustering suited for spatial data.The new method is designed, implemented, and tested to prove its ability to achieve partitioning objectives and the efficiency of its performance. The results of the tests are benchmarked against one of the most widely adopted approaches in partitioning spatial data and prove the ability of the novel method to surpass it in some of the evaluation criteria.