Alexandria Engineering Journal (Dec 2022)

A spatial distance-based spatial clustering algorithm for sparse image data

  • Tian-fan Zhang,
  • Zhe Li,
  • Qi Yuan,
  • You-ning Wang

Journal volume & issue
Vol. 61, no. 12
pp. 12609 – 12622

Abstract

Read online

By allocating each object to one of the predefined categories, image classification deeply understands the attributes and features of the data on each object in the scene, and further mines the potential features and internal connections of the data, supporting the subsequent application decision-making with necessary structured data. One of the key challenges to image classification is how to accurately classify sparse data, when there is an imbalance between different categories of data, i.e., how to identify small objects in images. Recognizing a person in satellite images is such a challenging task. These objects are sparse either globally or in each recognizable local segment. Therefore, they are often overlooked by the classifier, or removed as noises. During deep learning, feature sparsity means the samples contain too much useless information, which suppresses the generalization and accuracy of the model. To solve the problem, this paper presents a spatial distance-based spatial clustering algorithm for sparse image data (SDBSCA-SID). Firstly, the imaging range of the image sensor constitutes a two-dimensional (2D) constraint space. Under the constraint, spatial clustering was carried out based on the features of each sample to aggregate dense data into primary categories, and aggregate sparse data and noises into secondary categories. Referring to the 2D constrained space, multiple spatial classification surfaces were constructed to aggregate the sparse data to the two sides of these surfaces as much as possible. If the error is minimized, then the sparse data belong to these classification surfaces. To shorten the convergence time of the clustering algorithm on imbalanced data, the original sample set was cut into slices, and assigned to several calculation units for separate clustering. Next, the same-class clusters were merged through reduction. Finally, the obtained class labels were compared with the preset class labels, wrapping up the semantic segmentation of images. The stability and accuracy of our algorithm were demonstrated through tests on image samples.

Keywords