The Lancet Regional Health. Europe (Jan 2025)
Benefits and harms associated with the use of AI-related algorithmic decision-making systems by healthcare professionals: a systematic reviewResearch in context
Abstract
Summary: Background: Despite notable advancements in artificial intelligence (AI) that enable complex systems to perform certain tasks more accurately than medical experts, the impact on patient-relevant outcomes remains uncertain. To address this gap, this systematic review assesses the benefits and harms associated with AI-related algorithmic decision-making (ADM) systems used by healthcare professionals, compared to standard care. Methods: In accordance with the PRISMA guidelines, we included interventional and observational studies published as peer-reviewed full-text articles that met the following criteria: human patients; interventions involving algorithmic decision-making systems, developed with and/or utilizing machine learning (ML); and outcomes describing patient-relevant benefits and harms that directly affect health and quality of life, such as mortality and morbidity. Studies that did not undergo preregistration, lacked a standard-of-care control, or pertained to systems that assist in the execution of actions (e.g., in robotics) were excluded. We searched MEDLINE, EMBASE, IEEE Xplore, and Google Scholar for studies published in the past decade up to 31 March 2024. We assessed risk of bias using Cochrane's RoB 2 and ROBINS-I tools, and reporting transparency with CONSORT-AI and TRIPOD-AI. Two researchers independently managed the processes and resolved conflicts through discussion. This review has been registered with PROSPERO (CRD42023412156) and the study protocol has been published. Findings: Out of 2,582 records identified after deduplication, 18 randomized controlled trials (RCTs) and one cohort study met the inclusion criteria, covering specialties such as psychiatry, oncology, and internal medicine. Collectively, the studies included a median of 243 patients (IQR 124–828), with a median of 50.5% female participants (range 12.5–79.0, IQR 43.6–53.6) across intervention and control groups. Four studies were classified as having low risk of bias, seven showed some concerns, and another seven were assessed as having high or serious risk of bias. Reporting transparency varied considerably: six studies showed high compliance, four moderate, and five low compliance with CONSORT-AI or TRIPOD-AI. Twelve studies (63%) reported patient-relevant benefits. Of those with low risk of bias, interventions reduced length of stay in hospital and intensive care unit (10.3 vs. 13.0 days, p = 0.042; 6.3 vs. 8.4 days, p = 0.030), in-hospital mortality (9.0% vs. 21.3%, p = 0.018), and depression symptoms in non-complex cases (45.1% vs. 52.3%, p = 0.03). However, harms were frequently underreported, with only eight studies (42%) documenting adverse events. No study reported an increase in adverse events as a result of the interventions. Interpretation: The current evidence on AI-related ADM systems provides limited insights into patient-relevant outcomes. Our findings underscore the essential need for rigorous evaluations of clinical benefits, reinforced compliance with methodological standards, and balanced consideration of both benefits and harms to ensure meaningful integration into healthcare practice. Funding: This study did not receive any funding.