Bridging human and machine intelligence: Reverse-engineering radiologist intentions for clinical trust and adoption

Akash Awasthi; Ngan Le; Zhigang Deng; Rishi Agrawal; Carol C. Wu; Hien Van Nguyen

Computational and Structural Biotechnology Journal (Dec 2024)

Bridging human and machine intelligence: Reverse-engineering radiologist intentions for clinical trust and adoption

Akash Awasthi,
Ngan Le,
Zhigang Deng,
Rishi Agrawal,
Carol C. Wu,
Hien Van Nguyen

Affiliations

Akash Awasthi: Department of Electrical and Computer Engineering, University of Houston, United States; Correspondence to: Department of Electrical and Computer Engineering, University of Houston, Cullen College of Engineering, Room no, N368, Building-1, 4222 Martin Luther King Blvd, Houston, TX 77204, United States.
Ngan Le: Department of Computer Science & Computer Engineering, University of Arkansas, United States
Zhigang Deng: Department of Computer Science, University of Houston, Houston, TX, United States
Rishi Agrawal: Department of Thoracic Imaging, Division of Diagnostic Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
Carol C. Wu: Department of Thoracic Imaging, Division of Diagnostic Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
Hien Van Nguyen: Department of Electrical and Computer Engineering, University of Houston, United States

Journal volume & issue: Vol. 24
pp. 711 – 723

Abstract

Read online

In the rapidly evolving landscape of medical imaging, the integration of artificial intelligence (AI) with clinical expertise offers unprecedented opportunities to enhance diagnostic precision and accuracy. Yet, the ''black box'' nature of AI models often limits their integration into clinical practice, where transparency and interpretability are important. This paper presents a novel system leveraging the Large Multimodal Model (LMM) to bridge the gap between AI predictions and the cognitive processes of radiologists. This system consists of two core modules, Temporally Grounded Intention Detection (TGID) and Region Extraction (RE). The TGID module predicts the radiologist's intentions by analyzing eye gaze fixation heatmap videos and corresponding radiology reports. Additionally, the RE module extracts regions of interest that align with these intentions, mirroring the radiologist’s diagnostic focus. This approach introduces a new task, radiologist intention detection, and is the first application of Dense Video Captioning (DVC) in the medical domain. By making AI systems more interpretable and aligned with radiologist’s cognitive processes, this proposed system aims to enhance trust, improve diagnostic accuracy, and support medical education. Additionally, it holds the potential for automated error correction, guiding junior radiologists, and fostering more effective training and feedback mechanisms. This work sets a precedent for future research in AI-driven healthcare, offering a pathway towards transparent, trustworthy, and human-centered AI systems. We evaluated this model using NLG(Natural Language Generation), time-related, and vision-based metrics, demonstrating superior performance in generating temporally grounded intentions on REFLACX and EGD-CXR datasets. This model also demonstrated strong predictive accuracy in overlap scores for medical abnormalities and effective region extraction with high IoU(Intersection over Union), especially in complex cases like cardiomegaly and edema. These results highlight the system's potential to enhance diagnostic accuracy and support continuous learning in radiology.

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal

Abstract

Keywords