Artificial intelligence tools trained on human-labeled data reflect human biases: a case study in a large clinical consecutive knee osteoarthritis cohort

Anders Lenskjold; Mathias W Brejnebøl; Martin H Rose; Henrik Gudbergsen; Akshay Chaudhari; Anders Troelsen; Anne Moller; Janus U Nybing; Mikael Boesen

doi:10.1038/s41598-024-75752-z

Scientific Reports (Nov 2024)

Artificial intelligence tools trained on human-labeled data reflect human biases: a case study in a large clinical consecutive knee osteoarthritis cohort

Anders Lenskjold,
Mathias W Brejnebøl,
Martin H Rose,
Henrik Gudbergsen,
Akshay Chaudhari,
Anders Troelsen,
Anne Moller,
Janus U Nybing,
Mikael Boesen

Affiliations

Anders Lenskjold: Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg
Mathias W Brejnebøl: Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg
Martin H Rose: Center for Surgical Science, Zealand University Hospital
Henrik Gudbergsen: The Parker Institute, University of Copenhagen
Akshay Chaudhari: Department of Radiology, Stanford University
Anders Troelsen: Department of Clinical Medicine, University of Copenhagen University
Anne Moller: Department of Public Health, Center for General Practice, University of Copenhagen
Janus U Nybing: Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg
Mikael Boesen: Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg

DOI: https://doi.org/10.1038/s41598-024-75752-z
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Humans have been shown to have biases when reading medical images, raising questions about whether humans are uniform in their disease gradings. Artificial intelligence (AI) tools trained on human-labeled data may have inherent human non-uniformity. In this study, we used a radiographic knee osteoarthritis external validation dataset of 50 patients and a six-year retrospective consecutive clinical cohort of 8,273 patients. An FDA-approved and CE-marked AI tool was tested for potential non-uniformity in Kellgren-Lawrence grades between the right and left sides of the images. We flipped the images horizontally so that a left knee looked like a right knee and vice versa. According to human review, the AI tool showed non-uniformity with 20–22% disagreements on the external validation dataset and 13.6% on the cohort. However, we found no evidence of a significant difference in the accuracy compared to senior radiologists on the external validation dataset, or age bias or sex bias on the cohort. AI non-uniformity can boost the evaluated performance against humans, but image areas with inferior performance should be investigated.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords