An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight

Zhe Wang; Hao Xu; Pan Zhou; Gang Xiao

doi:10.3390/computation11020032

Computation (Feb 2023)

An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight

Zhe Wang,
Hao Xu,
Pan Zhou,
Gang Xiao

Affiliations

Zhe Wang: College of Information Engineering, Zhejiang University of Technology, Hangzhou 323000, China
Hao Xu: College of Engineering, Lishui University, Lishui 323000, China
Pan Zhou: College of Engineering, Lishui University, Lishui 323000, China
Gang Xiao: College of Information Engineering, Zhejiang University of Technology, Hangzhou 323000, China

DOI: https://doi.org/10.3390/computation11020032
Journal volume & issue: Vol. 11, no. 2
p. 32

Abstract

Read online

Multilabel data share important features, including label imbalance, which has a significant influence on the performance of classifiers. Because of this problem, a widely used multilabel classification algorithm, the multilabel k-nearest neighbor (ML-kNN) algorithm, has poor performance on imbalanced multilabel data. To address this problem, this study proposes an improved ML-kNN algorithm based on value and weight. In this improved algorithm, labels are divided into minority and majority, and different strategies are adopted for different labels. By considering the label of latent information carried by the nearest neighbors, a value calculation method is proposed and used to directly classify majority labels. Additionally, to address the misclassification problem caused by a lack of nearest neighbor information for minority labels, weight calculation is proposed. The proposed weight calculation converts distance information with and without label sets in the nearest neighbors into weights. The experimental results on multilabel datasets from different benchmarks demonstrate the performance of the algorithm, especially for datasets with high imbalance. Different evaluation metrics show that the results are improved by approximately 2–10%. The verified algorithm could be applied to a multilabel classification of various fields involving label imbalance, such as drug molecule identification, building identification, and text categorization.

Published in Computation

ISSN: 2079-3197 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/computation

About the journal

Abstract

Keywords