A filter-based feature selection approach in multilabel classification

Rafia Shaikh; Muhammad Rafi; Naeem Ahmed Mahoto; Adel Sulaiman; Asadullah Shaikh

doi:10.1088/2632-2153/ad035d

Machine Learning: Science and Technology (Jan 2023)

A filter-based feature selection approach in multilabel classification

Rafia Shaikh,
Muhammad Rafi,
Naeem Ahmed Mahoto,
Adel Sulaiman,
Asadullah Shaikh

Affiliations

Rafia Shaikh: ORCiD; Department of Software Engineering, Mehran University of Engineering & Technology , Indus Hwy, Jamhsoro 76062, Sindh, Pakistan; Computer Science Department, School of Computing, National University of Computer and Emerging Sciences , 3 A.K. Brohi Road, Islamabad 44000, Islamabad, Pakistan
Muhammad Rafi: Computer Science Department, School of Computing, National University of Computer and Emerging Sciences , 3 A.K. Brohi Road, Islamabad 44000, Islamabad, Pakistan
Naeem Ahmed Mahoto: ORCiD; Department of Software Engineering, Mehran University of Engineering & Technology , Indus Hwy, Jamhsoro 76062, Sindh, Pakistan
Adel Sulaiman: ORCiD; Department of Computer Science, College of Computer Science and Information Systems, Najran University , Najran 61441, Najran, Saudi Arabia; Scientific and Engineering Research Centre, Najran University , Najran 61441, Najran, Saudi Arabia
Asadullah Shaikh: ORCiD; Scientific and Engineering Research Centre, Najran University , Najran 61441, Najran, Saudi Arabia; Department of Information Systems, College of Computer Science and Information Systems, Najran University , Najran 61441, Najran, Saudi Arabia

DOI: https://doi.org/10.1088/2632-2153/ad035d
Journal volume & issue: Vol. 4, no. 4
p. 045018

Abstract

Read online

Multi-label classification is a fast-growing field of machine learning. Recent developments have shown several applications, including social media, healthcare, bio-molecular analysis, scene, and music classification associated with the multilabel classification. In classification problems, multiple labels (multilabel or more than one class label) are assigned to an unseen record instead of a single-label class assignment. Feature selection is a preprocessing phase used to identify the most relevant features that could improve the accuracy of the multilabel classifiers. The focus of this study is the feature selection method in multilabel classification. The study used a feature selection filter method involving the Fisher score, analysis of variance test, mutual information, Chi-Square, and ensembles of these statistical methods. An extensive range of machine learning algorithms is applied in the modelling phase of a multilabel classification model that includes binary relevance, classifier chain, label powerset, binary relevance KNN, multi-label twin support vector machine, multi-label KNN. Besides, label space partitioning and majority voting of ensemble methods are used and Random Forest is the base learner. The experiments are carried out over five different multilabel benchmarking datasets. The evaluation of the classification model is measured using accuracy, precision, recall, F1 score, and hamming loss. The study demonstrated that the filter methods (i.e. mutual information) having top weighted $80\%$ to $20\%$ features provided significant outcomes.

Published in Machine Learning: Science and Technology

ISSN: 2632-2153 (Online)
Publisher: IOP Publishing
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://iopscience.iop.org/journal/2632-2153

About the journal

Abstract

Keywords