Communications Medicine (May 2024)

A comprehensive AI model development framework for consistent Gleason grading

  • Xinmi Huo,
  • Kok Haur Ong,
  • Kah Weng Lau,
  • Laurent Gole,
  • David M. Young,
  • Char Loo Tan,
  • Xiaohui Zhu,
  • Chongchong Zhang,
  • Yonghui Zhang,
  • Longjie Li,
  • Hao Han,
  • Haoda Lu,
  • Jing Zhang,
  • Jun Hou,
  • Huanfen Zhao,
  • Hualei Gan,
  • Lijuan Yin,
  • Xingxing Wang,
  • Xiaoyue Chen,
  • Hong Lv,
  • Haotian Cao,
  • Xiaozhen Yu,
  • Yabin Shi,
  • Ziling Huang,
  • Gabriel Marini,
  • Jun Xu,
  • Bingxian Liu,
  • Bingxian Chen,
  • Qiang Wang,
  • Kun Gui,
  • Wenzhao Shi,
  • Yingying Sun,
  • Wanyuan Chen,
  • Dalong Cao,
  • Stephan J. Sanders,
  • Hwee Kuan Lee,
  • Susan Swee-Shan Hue,
  • Weimiao Yu,
  • Soo Yong Tan

DOI
https://doi.org/10.1038/s43856-024-00502-1
Journal volume & issue
Vol. 4, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Artificial Intelligence(AI)-based solutions for Gleason grading hold promise for pathologists, while image quality inconsistency, continuous data integration needs, and limited generalizability hinder their adoption and scalability. Methods We present a comprehensive digital pathology workflow for AI-assisted Gleason grading. It incorporates A!MagQC (image quality control), A!HistoClouds (cloud-based annotation), Pathologist-AI Interaction (PAI) for continuous model improvement, Trained on Akoya-scanned images only, the model utilizes color augmentation and image appearance migration to address scanner variations. We evaluate it on Whole Slide Images (WSI) from another five scanners and conduct validations with pathologists to assess AI efficacy and PAI. Results Our model achieves an average F1 score of 0.80 on annotations and 0.71 Quadratic Weighted Kappa on WSIs for Akoya-scanned images. Applying our generalization solution increases the average F1 score for Gleason pattern detection from 0.73 to 0.88 on images from other scanners. The model accelerates Gleason scoring time by 43% while maintaining accuracy. Additionally, PAI improve annotation efficiency by 2.5 times and led to further improvements in model performance. Conclusions This pipeline represents a notable advancement in AI-assisted Gleason grading for improved consistency, accuracy, and efficiency. Unlike previous methods limited by scanner specificity, our model achieves outstanding performance across diverse scanners. This improvement paves the way for its seamless integration into clinical workflows.