Informatics in Medicine Unlocked (Jan 2024)
Elucidating B4GALNT1 as potential biomarker in hepatocellular carcinoma using machine learning models and mutational dynamics explored through MD simulation
Abstract
Liver hepatocellular carcinoma (LIHC) is considered one of the primary contributors to cancer-related mortality on a global scale. The identification of new biomarkers is of utmost importance due to the fact that patients with LIHC are frequently detected at advanced stages, leading to an increased mortality rate. The study utilized TCGA-LIHC gene expression datasets to identify biomarkers and to address the complexity of datasets. A combination of feature selection (FS) techniques was used, and the performance of this strategy was assessed using ten machine learning classifiers. The findings were integrated, revealing biomarkers identified through at least five FS techniques. Through our proposed approach, we identified 55 potential biomarkers for LIHC. The Gaussian Naive Bayes Classifier (AUC = 0.99) was found to be the most effective classifier, achieving 98.67% accuracy when utilizing the 55 identified biomarkers in the test dataset. Additionally, we conducted differential gene expression, survival analysis, and enrichment analysis for all the identified biomarkers. Subsequently, Lasso-penalized Cox regression further refined the identified biomarkers to thirteen. Out of thirteen genes, we singled out B4GALNT1 because of its statistical significance in differential expression analysis and increasing importance across various cancer types, including LIHC. We carried out comprehensive bioinformatics and molecular dynamics simulation studies along with other structural analysis of B4GALNT1 in LIHC. In LIHC, six mutations (P64Q, S131F, A311S, R340Q, D478H, and P507Q) have been predicted to be probably damaging by evaluating in-silico prediction algorithms. In comparison to the wild type, the B4GALNT1 variations, specifically P64Q and S131F, demonstrate increased stability. However, these mutations lead to decreased atomic fluctuations, indicating a rigid protein structure. Again, mutations like A311S and P507Q induce increased flexibility, highlighting their structural impact on B4GALNT1. The study demonstrated the combination of various feature selection methods effectively reveals new biomarkers, thereby directly impacting their biological significance. Furthermore, our findings indicate a link between increased B4GALNT1 expression in individuals with liver cancer and a poorer prognosis, highlighting its potential as a promising therapeutic target.