IEEE Access (Jan 2022)
MMMF: Multimodal Multitask Matrix Factorization for Classification and Feature Selection
Abstract
Integration of multiple biological datasets is crucial to understand comprehensive biological mechanisms with the aid of a rapid development of biomedical technology. However, the predictive modeling for such an integrated dataset faces two major challenges, namely, heterogeneity and imbalance in the acquired data. Thus, in this study, we present a method for the integration of multiple biological datasets called multimodal multitask matrix factorization (MMMF) to address these issues. The MMMF uses matrix factorization (MF) to integrate data from multiple heterogeneous biological datasets, and oversampling is applied to resolve the imbalanced data during the training step. Moreover, gradient surgery is used for multitask (MF and classification) learning to increase the quantity of classification information by projecting the gradients of the MF that conflict with the classification gradient onto the normal plane of a classification gradient. We demonstrate that MMMF outperforms other state-of-the-art biomedical classification models in binary and multi-class classification problems using five biological datasets. We also show that MMMF can be used as a feature selection approach for finding biomarkers that help in classification. The source code of the MMMF is available at https://github.com/DMCB-GIST/MMMF.
Keywords