PeerJ (Nov 2024)

Common laboratory results-based artificial intelligence analysis achieves accurate classification of plasma cell dyscrasias

  • Bihua Yao,
  • Yicheng Liu,
  • Yuwei Wu,
  • Siyu Mao,
  • Hangbiao Zhang,
  • Lei Jiang,
  • Cheng Fei,
  • Shuang Wang,
  • Jijun Tong,
  • Jianguo Wu

DOI
https://doi.org/10.7717/peerj.18391
Journal volume & issue
Vol. 12
p. e18391

Abstract

Read online Read online

Background Plasma cell dyscrasias encompass a diverse set of disorders, where early and precise diagnosis is essential for optimizing patient outcomes. Despite advancements, current diagnostic methodologies remain underutilized in applying artificial intelligence (AI) to routine laboratory data. This study seeks to construct an AI-driven model leveraging standard laboratory parameters to enhance diagnostic accuracy and classification efficiency in plasma cell dyscrasias. Methods Data from 1,188 participants (609 with plasma cell dyscrasias and 579 controls) collected between 2018 and 2023 were analyzed. Initial variable selection employed Kruskal-Wallis and Wilcoxon tests, followed by dimensionality reduction and variable prioritization using the Shapley Additive Explanations (SHAP) approach. Nine pivotal variables were identified, including hemoglobin (HGB), serum creatinine, and β2-microglobulin. Utilizing these, four machine learning models (gradient boosting decision tree (GBDT), support vector machine (SVM), deep neural network (DNN), and decision tree (DT) were developed and evaluated, with performance metrics such as accuracy, recall, and area under the curve (AUC) assessed through 5-fold cross-validation. A subtype classification model was also developed, analyzing data from 380 cases to classify disorders such as multiple myeloma (MM) and monoclonal gammopathy of undetermined significance (MGUS). Results 1. Variable selection: The SHAP method pinpointed nine critical variables, including hemoglobin (HGB), serum creatinine, erythrocyte sedimentation rate (ESR), and β2-microglobulin. 2. Diagnostic model performance: The GBDT model exhibited superior diagnostic performance for plasma cell dyscrasias, achieving 93.5% accuracy, 98.1% recall, and an AUC of 0.987. External validation reinforced its robustness, with 100% accuracy and an F1 score of 98.5%. 3. Subtype Classification: The DNN model excelled in classifying multiple myeloma, MGUS, and light-chain myeloma, demonstrating sensitivity and specificity above 90% across all subtypes. Conclusions AI models based on routine laboratory results significantly enhance the precision of diagnosing and classifying plasma cell dyscrasias, presenting a promising avenue for early detection and individualized treatment strategies.

Keywords