KinPred-RNA—kinase activity inference and cancer type classification using machine learning on RNA-seq data
Yuntian Zhang,
Lantian Yao,
Chia-Ru Chung,
Yixian Huang,
Shangfu Li,
Wenyang Zhang,
Yuxuan Pang,
Tzong-Yi Lee
Affiliations
Yuntian Zhang
Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
Lantian Yao
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China; Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
Chia-Ru Chung
Department of Computer Science and Information Engineering, National Central University, Taoyuan 320953, Taiwan
Yixian Huang
Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
Shangfu Li
Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
Wenyang Zhang
School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
Yuxuan Pang
Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
Tzong-Yi Lee
Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan; Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan; Corresponding author
Summary: Kinases as important enzymes can transfer phosphate groups from high-energy and phosphate-donating molecules to specific substrates and play essential roles in various cellular processes. Existing algorithms for kinase activity from phosphorylated proteomics data are often costly, requiring valuable samples. Moreover, methods to extract kinase activities from bulk RNA sequencing data remain undeveloped. In this study, we propose a computational framework KinPred-RNA to derive kinase activities from bulk RNA-sequencing data in cancer samples. KinPred-RNA framework, using the extreme gradient boosting (XGBoost) regression model, outperforms random forest regression, multiple linear regression, and support vector machine regression models in predicting kinase activities from cancer-related RNA sequencing data. Efficient gene signatures from the LINCS-L1000 dataset were used as inputs for KinPred-RNA. The results highlight its potential to be related to biological function. In conclusion, KinPred RNA constitutes a significant advance in cancer research by potentially facilitating the identification of cancer.