Zhejiang Daxue xuebao. Lixue ban (Mar 2024)

Develop spatial learning indexing using improved K-means clustering partition(基于改进的K-means聚类分区均匀化空间学习索引)

  • 傅晨华(FU Chenhua),
  • 张丰(ZHANG Feng),
  • 胡林舒(HU Linshu),
  • 王立君(WANG Lijun)

DOI
https://doi.org/10.3785/j.issn.1008-9497.2024.02.003
Journal volume & issue
Vol. 51, no. 2
pp. 153 – 161

Abstract

Read online

With the rapid increase of data size, the defects of traditional spatial indexing become more and more apparent. In comparison, learning indexing is based on data distribution. Its volume will not expand with the increase of the amount of data, and can achieve better performance without performing hierarchical comparison. Nevertheless, there are still two difficulties in applying the idea of learning indexing to spatial data: (1) How to choose appropriate dimension reduction method to sort the spatial data. (2) How to simplify data distribution of the dimension reduced data and make it easy to fit. This paper proposes a new type of grid mixed cluster partition learning indexing (grid-ml) based on the idea of learning indexing. In view of the above two difficulties, grid-ml uses z curve to reduce the dimension, and deals with the jumping problem with double-layer grid structure. Then, the improved K-means clustering method is used to simplify data distribution. The results show that grid-ml builds fast with small spatial storage volume, and can query fast as well, demonstrating significant advantages over the traditional spatial indexing approach.(传统空间索引的体量随数据量的增加而膨胀,查询效率较低。学习索引的体量不随数据量的增加而膨胀,同时避免了层级比较查询,性能优异。将学习索引应用于空间索引存在2个难点:一是选取合适的降维方法实现空间数据的排序;二是对降维后数据序列进行有效的简化分布计算,使其易于拟合。基于此,提出了一种网格混合聚类分区学习索引(grid-ml),用z曲线进行降维,用双层网格结构优化查询策略,用改进的K-means聚类算法进行数据分区,实现数据分布均匀化。对比实验发现,grid-ml构建速度快、存储空间小、查询效率高,较传统空间索引优势显著。)

Keywords