Multi-Label Bioinformatics Data Classification With Ensemble Embedded Feature Selection

Yumeng Guo; Fu-Lai Chung; Guozheng Li; Lei Zhang

doi:10.1109/ACCESS.2019.2931035

IEEE Access (Jan 2019)

Multi-Label Bioinformatics Data Classification With Ensemble Embedded Feature Selection

Yumeng Guo,
Fu-Lai Chung,
Guozheng Li,
Lei Zhang

Affiliations

Yumeng Guo: ORCiD; Department of Control Science and Engineering, Tongji University, Shanghai, China
Fu-Lai Chung: Department of Computing, The Hong Kong Polytechnic University, Hong Kong
Guozheng Li: Department of Control Science and Engineering, Tongji University, Shanghai, China
Lei Zhang: China Academy of Chinese Medical Sciences, Institute of Basic Research in Clinical Medicine, Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2019.2931035
Journal volume & issue: Vol. 7
pp. 103863 – 103875

Abstract

Read online

In bioinformatics, the vast of multi-label type of datasets, including clinical text, gene, and protein data, need to be categorized. Specifically, due to the redundant or irrelevant features in bioinformatics data, the performance of multi-label classifiers will be limited, and therefore, selecting effective features from the feature space is necessary. However, most of the proposed methods, which aimed at dealing with multi-label feature selection problem in the past few years, only adopt a simple and direct strategy that transforms the multi-label feature selection problem into more single-label ones and ignore correlations among different labels. In this paper, a novel algorithm named ensemble embedded feature selection (EEFS) is proposed to handle multi-label bioinformatics data learning problem in a more effective and efficient way. The EEFS does not only explicitly find out the correlations among labels, but it can also adequately utilize the label correlations by multi-label classifiers and evaluation measures. Furthermore, it can reduce the accumulated errors of data itself by employing an ensemble method. The experimental results on five multi-label bioinformatics datasets show that our algorithm achieves significant superiority over the other state-of-the-art algorithms.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords