Identification and verification of four candidate biomarkers for early diagnosis of osteoarthritis by machine learning
Xinyu Wang,
Tianyi Liu,
Yueyang Sheng,
Yanzhuo Zhang,
Cheng Qiu,
Manyu Li,
Yuxi Cheng,
Shan Li,
Ying Wang,
Chengai Wu
Affiliations
Xinyu Wang
Department of Molecular Orthopaedics, National Center for Orthopaedics, Beijing Research Institute of Traumatology and Orthopaedics, Beijing Jishuitan Hospital, Capital Medical University, Beijing, 100035, China; Department of Anesthesiology, National Center for Orthopaedics, Beijing Jishuitan Hospital, Capital Medical University, Beijing, 100035, China
Tianyi Liu
Department of Molecular Orthopaedics, National Center for Orthopaedics, Beijing Research Institute of Traumatology and Orthopaedics, Beijing Jishuitan Hospital, Capital Medical University, Beijing, 100035, China; Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China; Department of Hepatobiliary Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
Yueyang Sheng
Department of Molecular Orthopaedics, National Center for Orthopaedics, Beijing Research Institute of Traumatology and Orthopaedics, Beijing Jishuitan Hospital, Capital Medical University, Beijing, 100035, China
Yanzhuo Zhang
Department of Molecular Orthopaedics, National Center for Orthopaedics, Beijing Research Institute of Traumatology and Orthopaedics, Beijing Jishuitan Hospital, Capital Medical University, Beijing, 100035, China
Cheng Qiu
Department of Orthopaedic Surgery, Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
Manyu Li
Department of Gastroenterology, Qilu Hospital of Shandong University, Jinan, Shandong, 250012, China
Yuxi Cheng
Xiangya Stomatological Hospital & Xiangya School of Stomatology, Central South University, Changsha, Hunan, 410008, China
Shan Li
Department of Molecular Orthopaedics, National Center for Orthopaedics, Beijing Research Institute of Traumatology and Orthopaedics, Beijing Jishuitan Hospital, Capital Medical University, Beijing, 100035, China
Ying Wang
Department of Molecular Orthopaedics, National Center for Orthopaedics, Beijing Research Institute of Traumatology and Orthopaedics, Beijing Jishuitan Hospital, Capital Medical University, Beijing, 100035, China
Chengai Wu
Department of Molecular Orthopaedics, National Center for Orthopaedics, Beijing Research Institute of Traumatology and Orthopaedics, Beijing Jishuitan Hospital, Capital Medical University, Beijing, 100035, China; Corresponding author. Department of Molecular Orthopaedics, Beijing Research Institute of Traumatology and Orthopaedics, Beijing Jishuitan Hospital, Capital Medical University, Beijing, 100035, China.
Background: Osteoarthritis (OA) is a common chronic joint disease. This study aimed to investigate possible OA diagnostic biomarkers and to verify their significance in clinical samples. Methods: We exploited three datasets from the Gene Expression Omnibus (GEO) database, serving as the training set. We first determined differentially expressed genes and screened candidate diagnostic biomarkers by applying three machine learning algorithms (Random Forest, Least Absolute Shrinkage and Selection Operator logistic regression, Support Vector Machine-Recursive Feature Elimination). Another GEO dataset was used as the validation set. The test set consisted of RNA-sequenced peripheral blood samples collected from patients and healthy donors. Blood samples and chondrocytes were collected for quantitative real-time PCR to confirm expression levels. Receiver operating characteristic curves were generated for individual and combined biomarkers. Results: In total, 251 DEGs were screened, where B3GALNT1, SCRG1 and ZNF423 were screened by all three algorithms. The area under the curve (AUC) of various biomarkers in our test set did not reach as high as that in public datasets. GRB10 exhibited highest AUC of 0.947 in the training set but 0.691 in our test set, while the favorable combined model comprising B3GALNT1, GRB10, KLF9 and SCRG1 demonstrated an AUC of 0.986 in the training set, 1.000 in the validation set and 0.836 in our test set. Conclusion: We identified a combined model for early diagnosis of OA that includes B3GALNT1, GRB10, KLF9 and SCRG1. This finding offers new avenues for further exploration of mechanisms underlying OA.