Dynamic Meta-data Network Sparse PCA for Cancer Subtype Biomarker Screening

Rui Miao; Xin Dong; Xiao-Ying Liu; Sio-Long Lo; Xin-Yue Mei; Qi Dang; Jie Cai; Shao Li; Kuo Yang; Sheng-Li Xie; Yong Liang

doi:10.3389/fgene.2022.869906

Frontiers in Genetics (May 2022)

Dynamic Meta-data Network Sparse PCA for Cancer Subtype Biomarker Screening

Rui Miao,
Xin Dong,
Xiao-Ying Liu,
Sio-Long Lo,
Xin-Yue Mei,
Qi Dang,
Jie Cai,
Shao Li,
Kuo Yang,
Sheng-Li Xie,
Yong Liang

Affiliations

Rui Miao: Institute of Systems Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China
Xin Dong: Institute of Systems Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China
Xiao-Ying Liu: Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, China
Sio-Long Lo: Institute of Systems Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China
Xin-Yue Mei: Institute of Systems Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China
Qi Dang: Institute of Systems Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China
Jie Cai: Institute of Systems Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China
Shao Li: MOE Key Laboratory of Bioinformatics, TCM-X Center/Bioinformatics Division, BNRIST/Department of Automation, Tsinghua University, Beijing, China
Kuo Yang: MOE Key Laboratory of Bioinformatics, TCM-X Center/Bioinformatics Division, BNRIST/Department of Automation, Tsinghua University, Beijing, China
Sheng-Li Xie: Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou, China
Yong Liang: Peng Cheng Laboratory, Shenzhen, China

DOI: https://doi.org/10.3389/fgene.2022.869906
Journal volume & issue: Vol. 13

Abstract

Read online

Previous research shows that each type of cancer can be divided into multiple subtypes, which is one of the key reasons that make cancer difficult to cure. Under these circumstances, finding a new target gene of cancer subtypes has great significance on developing new anti-cancer drugs and personalized treatment. Due to the fact that gene expression data sets of cancer are usually high-dimensional and with high noise and have multiple potential subtypes’ information, many sparse principal component analysis (sparse PCA) methods have been used to identify cancer subtype biomarkers and subtype clusters. However, the existing sparse PCA methods have not used the known cancer subtype information as prior knowledge, and their results are greatly affected by the quality of the samples. Therefore, we propose the Dynamic Metadata Edge-group Sparse PCA (DM-ESPCA) model, which combines the idea of meta-learning to solve the problem of sample quality and uses the known cancer subtype information as prior knowledge to capture some gene modules with better biological interpretations. The experiment results on the three biological data sets showed that the DM-ESPCA model can find potential target gene probes with richer biological information to the cancer subtypes. Moreover, the results of clustering and machine learning classification models based on the target genes screened by the DM-ESPCA model can be improved by up to 22–23% of accuracies compared with the existing sparse PCA methods. We also proved that the result of the DM-ESPCA model is better than those of the four classic supervised machine learning models in the task of classification of cancer subtypes.

Published in Frontiers in Genetics

ISSN: 1664-8021 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Biology (General): Genetics
Website: http://journal.frontiersin.org/journal/genetics

About the journal

Abstract

Keywords