IEEE Access (Jan 2024)
A K-Means-Based Interpolation Algorithm With Lp-Norm and Feature Weighting
Abstract
The integrity of data is crucial for the majority of existing data analysis methods. However, incomplete and unbalanced datasets from the collection and organization process will affect analysis accuracy. Existing interpolation algorithms often overlook feature importance, resulting in either cumbersome processes or underutilization of data information. This study introduces an interpolation method based on clustering algorithm focused on improving the accuracy and efficiency of missing data processing in datasets. First, in this paper, we clarify the problem to be solved about data interpolation, and we consider the importance of the information brought by the data itself to the interpolation, so we propose a scheme that combines clustering and interpolation. We propose a new method that uses the Lp norm as a similarity measure in the K-means clustering algorithm, and introduce a controllable weighting formula based on the current data segmentation. Methodologically, the clustering and interpolation are synchronized by iteratively updating the variable optimization cost function. The experimental results demonstrate significant improvements of the proposed interpolation algorithm over traditional techniques, particularly in tasks such as data labeling and classification within real datasets for clustering and classification.
Keywords