Diagnostics (Aug 2020)
Machine Learning Model Comparison in the Screening of Cholangiocarcinoma Using Plasma Bile Acids Profiles
Abstract
Bile acids (BAs) assessments are garnering increasing interest for their potential involvement in development and progression of cholangiocarcinoma (CCA). Since machine learning (ML) algorithms are increasingly used for exploring metabolomic profiles, we evaluated performance of some ML models for dissecting patients with CCA or benign biliary diseases according to their plasma BAs profiles. We used ultra-performance liquid chromatography tandem mass spectrometry (UHPLC-MS/MS) for assessing plasma BAs profile in 112 patients (70 CCA, 42 benign biliary diseases). Twelve normalisation procedures were applied, and performance of six ML algorithms were evaluated (logistic regression, k-nearest neighbors, naïve bayes, RBF SVM, random forest, extreme gradient boosting). Naïve bayes, using direct bilirubin concentration for normalisation of BAs, was the ML model displaying better performance in the holdout set, with an Area Under Curve (AUC) of 0.95, 0.79 sensitivity, 1.00 specificity. This model, also characterised by 1.00 positive predictive value and 0.73 negative predictive value, displayed a globally excellent accuracy (86.4%). The accuracy of the other five models was lower, and AUCs ranged 0.75–0.95. Preliminary results of this study show that application of ML to BAs profile analysis can provide a valuable contribution for characterising bile duct diseases and identifying patients with higher likelihood of having malignant pathologies.
Keywords