Identification of Pan-Cancer Biomarkers Based on the Gene Expression Profiles of Cancer Cell Lines

ShiJian Ding; Hao Li; Yu-Hang Zhang; XianChao Zhou; KaiYan Feng; ZhanDong Li; Lei Chen; Tao Huang; Tao Huang; Yu-Dong Cai

doi:10.3389/fcell.2021.781285

Frontiers in Cell and Developmental Biology (Nov 2021)

Identification of Pan-Cancer Biomarkers Based on the Gene Expression Profiles of Cancer Cell Lines

ShiJian Ding,
Hao Li,
Yu-Hang Zhang,
XianChao Zhou,
KaiYan Feng,
ZhanDong Li,
Lei Chen,
Tao Huang,
Tao Huang,
Yu-Dong Cai

Affiliations

ShiJian Ding: School of Life Sciences, Shanghai University, Shanghai, China
Hao Li: College of Food Engineering, Jilin Engineering Normal University, Changchun, China
Yu-Hang Zhang: Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
XianChao Zhou: Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
KaiYan Feng: Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
ZhanDong Li: College of Food Engineering, Jilin Engineering Normal University, Changchun, China
Lei Chen: College of Information Engineering, Shanghai Maritime University, Shanghai, China
Tao Huang: CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
Tao Huang: CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
Yu-Dong Cai: School of Life Sciences, Shanghai University, Shanghai, China

DOI: https://doi.org/10.3389/fcell.2021.781285
Journal volume & issue: Vol. 9

Abstract

Read online

There are many types of cancers. Although they share some hallmarks, such as proliferation and metastasis, they are still very different from many perspectives. They grow on different organ or tissues. Does each cancer have a unique gene expression pattern that makes it different from other cancer types? After the Cancer Genome Atlas (TCGA) project, there are more and more pan-cancer studies. Researchers want to get robust gene expression signature from pan-cancer patients. But there is large variance in cancer patients due to heterogeneity. To get robust results, the sample size will be too large to recruit. In this study, we tried another approach to get robust pan-cancer biomarkers by using the cell line data to reduce the variance. We applied several advanced computational methods to analyze the Cancer Cell Line Encyclopedia (CCLE) gene expression profiles which included 988 cell lines from 20 cancer types. Two feature selection methods, including Boruta, and max-relevance and min-redundancy methods, were applied to the cell line gene expression data one by one, generating a feature list. Such list was fed into incremental feature selection method, incorporating one classification algorithm, to extract biomarkers, construct optimal classifiers and decision rules. The optimal classifiers provided good performance, which can be useful tools to identify cell lines from different cancer types, whereas the biomarkers (e.g. NCKAP1, TNFRSF12A, LAMB2, FKBP9, PFN2, TOM1L1) and rules identified in this work may provide a meaningful and precise reference for differentiating multiple types of cancer and contribute to the personalized treatment of tumors.

Published in Frontiers in Cell and Developmental Biology

ISSN: 2296-634X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Biology (General)
Website: https://www.frontiersin.org/journals/cell-and-developmental-biology

About the journal

Abstract

Keywords