Machine Learning: Science and Technology (Jan 2023)

A filter-based feature selection approach in multilabel classification

  • Rafia Shaikh,
  • Muhammad Rafi,
  • Naeem Ahmed Mahoto,
  • Adel Sulaiman,
  • Asadullah Shaikh

DOI
https://doi.org/10.1088/2632-2153/ad035d
Journal volume & issue
Vol. 4, no. 4
p. 045018

Abstract

Read online

Multi-label classification is a fast-growing field of machine learning. Recent developments have shown several applications, including social media, healthcare, bio-molecular analysis, scene, and music classification associated with the multilabel classification. In classification problems, multiple labels (multilabel or more than one class label) are assigned to an unseen record instead of a single-label class assignment. Feature selection is a preprocessing phase used to identify the most relevant features that could improve the accuracy of the multilabel classifiers. The focus of this study is the feature selection method in multilabel classification. The study used a feature selection filter method involving the Fisher score, analysis of variance test, mutual information, Chi-Square, and ensembles of these statistical methods. An extensive range of machine learning algorithms is applied in the modelling phase of a multilabel classification model that includes binary relevance, classifier chain, label powerset, binary relevance KNN, multi-label twin support vector machine, multi-label KNN. Besides, label space partitioning and majority voting of ensemble methods are used and Random Forest is the base learner. The experiments are carried out over five different multilabel benchmarking datasets. The evaluation of the classification model is measured using accuracy, precision, recall, F1 score, and hamming loss. The study demonstrated that the filter methods (i.e. mutual information) having top weighted $80\%$ to $20\%$ features provided significant outcomes.

Keywords