BMC Medical Education (Sep 2024)

Development of an online authentic radiology viewing and reporting platform to test the skills of radiology trainees in Low- and Middle-Income Countries

  • Hubert Vesselle,
  • Justy Antony Chiramal,
  • Stephen E. Hawes,
  • Eric Schulze,
  • Tham Nguyen,
  • Rose Ndumia,
  • Sudhir Vinayak

DOI
https://doi.org/10.1186/s12909-024-05899-w
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Background Diagnostic radiology residents in low- and middle-income countries (LMICs) may have to provide significant contributions to the clinical workload before the completion of their residency training. Because of time constraints inherent to the delivery of acute care, some of the most clinically impactful diagnostic radiology errors arise from the use of Computed Tomography (CT) in the management of acutely ill patients. As a result, it is paramount to ensure that radiology trainees reach adequate skill levels prior to assuming independent on-call responsibilities. We partnered with the radiology residency program at the Aga Khan University Hospital in Nairobi (Kenya) to evaluate a novel cloud-based testing method that provides an authentic radiology viewing and interpretation environment. It is based on Lifetrack, a unique Google Chrome-based Picture Archiving and Communication System, that enables a complete viewing environment for any scan, and provides a novel report generation tool based on Active Templates which are a patented structured reporting method. We applied it to evaluate the skills of AKUHN trainees on entire CT scans representing the spectrum of acute non-trauma abdominal pathology encountered in a typical on-call setting. We aimed to demonstrate the feasibility of remotely testing the authentic practice of radiology and to show that important observations can be made from such a Lifetrack-based testing approach regarding the radiology skills of an individual practitioner or of a cohort of trainees. Methods A total of 13 anonymized trainees with experience from 12 months to over 4 years took part in the study. Individually accessing the Lifetrack tool they were tested on 37 abdominal CT scans (including one normal scan) over six 2-hour sessions on consecutive days. All cases carried the same clinical history of acute abdominal pain. During each session the trainees accessed the corresponding Lifetrack test set using clinical workstations, reviewed the CT scans, and formulated an opinion for the acute diagnosis, any secondary pathology, and incidental findings on the scan. Their scan interpretations were composed using the Lifetrack report generation system based on active templates in which segments of text can be selected to assemble a detailed report. All reports generated by the trainees were scored on four different interpretive components: (a) acute diagnosis, (b) unrelated secondary diagnosis, (c) number of missed incidental findings, and (d) number of overcalls. A 3-score aggregate was defined from the first three interpretive elements. A cumulative score modified the 3-score aggregate for the negative effect of interpretive overcalls. Results A total of 436 scan interpretations and scores were available from 13 trainees tested on 37 cases. The acute diagnosis score ranged from 0 to 1 with a mean of 0.68 ± 0.36 and median of 0.78 (IQR: 0.5-1), and there were 436 scores. An unrelated secondary diagnosis was present in 11 cases, resulting in 130 secondary diagnosis scores. The unrelated secondary diagnosis score ranged from 0 to 1, with mean score of 0.48 ± 0.46 and median of 0.5 (IQR: 0–1). There were 32 cases with incidental findings, yielding 390 scores for incidental findings. The number of missed incidental findings ranged from 0 to 5 with a median at 1 (IQR: 1–2). The incidental findings score ranged from 0 to 1 with a mean of 0.4 ± 0.38 and median of 0.33 (IQR: 0- 0.66). The number of overcalls ranged from 0 to 3 with a median at 0 (IQR: 0–1) and a mean of 0.36 ± 0.63. The 3-score aggregate ranged from 0 to 100 with a mean of 65.5 ± 32.5 and median of 77.3 (IQR: 45.0, 92.5). The cumulative score ranged from − 30 to 100 with a mean of 61.9 ± 35.5 and median of 71.4 (IQR: 37.4, 92.0). The mean acute diagnosis scores and SD by training period were 0.62 ± 0.03, 0.80 ± 0.05, 0.71 ± 0.05, 0.58 ± 0.07, and 0.66 ± 0.05 for trainees with ≤ 12 months, 12–24 months, 24–36 months, 36–48 months and > 48 months respectively. The mean acute diagnosis score of 12–24 months training was the only statistically significant greater score when compared to ≤ 12 months by the ANOVA with Tukey testing (p = 0.0002). We found a similar trend with distribution of 3-score aggregates and cumulative scores. There were no significant associations when the training period was categorized as less than and more than 2 years. We looked at the distribution of the 3-score aggregate versus the number of overcalls by trainee, and we found that the 3-score aggregate was inversely related to the number of overcalls. Heatmaps and raincloud plots provided an illustrative means to visualize the relative performance of trainees across cases. Conclusion We demonstrated the feasibility of remotely testing the authentic practice of radiology and showed that important observations can be made from our Lifetrack-based testing approach regarding radiology skills of an individual or a cohort. From observed weaknesses areas for targeted teaching can be implemented, and retesting could reveal their impact. This methodology can be customized to different LMIC environments and expanded to board certification examinations.

Keywords