International Journal of Medical Students (Jan 2025)
Comparing Treatment Recommendations for Ten Dermatological Conditions Using ChatGPT, Claude, and PI AI Models
Abstract
BACKGROUND: Artificial Intelligence (AI) is being increasingly utilized in healthcare and offers a potential alternative for gathering medical information in the future. The gold standard for many physicians on guiding their approach to medical management has been UpToDate and PubMed. In this review we attempted to see how well three AI models (ChatGPT, Pi, and Claude) could perform in generating first line treatment recommendations when compared against UpToDate. METHODS: To test the performance of these AI models, medical scenarios describing physical exam findings and patient histories, were sourced from the clinician generated medical education platform for ten dermatological diseases and inputted into the models. The models were then prompt with the query: “What is the first line treatment?” RESULTS: The results were tabulated, and it was found that Claude could successfully generated first line treatment recommendations that corresponded to UpToDate for all ten of the diseases tested with the other models successfully predicting nine of the ten correct treatment regimes. However, ChatGPT and Pi mistakenly diagnosed the squamous cell carcinoma vignette as actinic keratosis and provided inaccurate treatment advice. CONCLUSION: These AI models demonstrate that future developments in artificial intelligence may offer a free alternative to UpToDate as improvements in management recommendations are made as artificial intelligence models are further refined. However, the benefits of future utilization must be weighed against the risks of overreliance on this type of technology, especially if proper validation of information does not take place.