Journal of Scientific Innovation in Medicine (Nov 2024)

Innovative Machine Learning Approach for Distinguishing Rheumatoid Arthritis and Osteoarthritis: Integrating Shapely Additive Explanations and Dendrograms

  • Alexander A. Huang,
  • Samuel Y. Huang

DOI
https://doi.org/10.29024/jsim.181
Journal volume & issue
Vol. 7, no. 1
pp. 4 – 4

Abstract

Read online

Background: Arthritis is a major healthcare issue and accurate diagnosis is important to treatment. Objective: The study aimed to identify and intuitively visualize feature importance of factors associated with osteoarthritis versus rheumatoid arthritis in a representative population of United States adults. Methods: A retrospective analysis was conducted using a nationally representative cohort, the National Health and Nutrition Examination Surveys (NHANES 2017–2020). All adult patients greater than 18 years of age (total of 1,483 individuals) with either Osteoarthritis or Rheumatoid Arthritis were included. Univariable regression was used to identify significant nutritional covariates to be included in a machine learning model and feature importance was reported. Dendrogram and heatmap were created based on clustering of model statistics. The National Center for Health Statistics Ethics Review Board authorized the data acquisition and analysis. Results: 1,483 patients met the inclusion criteria of adults greater than 18 years of age with demographic questionnaire information completed. The machine learning model had 56 out of a total of 681 features that were found to be significant on univariate analysis (P < 0.01). The XGBoost model had an Area Under the Receiver Operator Characteristic Curve (AUROC) = 0.710. The four highest ranked features by gain, a measure of the percentage contribution of the covariate to the overall model prediction, were Income to Poverty Ratio (8.7%), Hip Circumference (6.5%), Dietary Folate Equivalent Intake (Folate DFE) (6.1%) and Globulin (5.1%). Cluster 1 of the heatmap and dendrogram also included Income to Poverty Ratio, Direct HDL Cholesterol (mmol/L), BMXHIP–Hip Circumference, Folate DFE, and Globulin indicating they were most similar in having high aggregate gain, cover, and frequency metrics. Conclusion: Machine learning models that incorporate dendrograms and heat maps can offer additional summaries of model statistics that assist in differentiating factors between osteoarthritis and rheumatoid arthritis. The clinical models can assist in physician diagnosis of common conditions. Teaser Text: Dendrogram Statistics.

Keywords