Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning

Weijian Huang; Cheng Li; Hong-Yu Zhou; Hao Yang; Jiarun Liu; Yong Liang; Hairong Zheng; Shaoting Zhang; Shanshan Wang

doi:10.1038/s41467-024-51749-0

Nature Communications (Sep 2024)

Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning

Weijian Huang,
Cheng Li,
Hong-Yu Zhou,
Hao Yang,
Jiarun Liu,
Yong Liang,
Hairong Zheng,
Shaoting Zhang,
Shanshan Wang

Affiliations

Weijian Huang: Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Cheng Li: Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Hong-Yu Zhou: Department of Biomedical Informatics, Harvard Medical University
Hao Yang: Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Jiarun Liu: Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Yong Liang: Pengcheng Laboratory
Hairong Zheng: Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Shaoting Zhang: Qingyuan Research Institute, Shanghai Jiao Tong University
Shanshan Wang: Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences

DOI: https://doi.org/10.1038/s41467-024-51749-0
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face crucial challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and the capability of utilizing very limited or even no task-specific labeled data in real-world clinical applications. In this study, we present MaCo, a masked contrastive chest X-ray foundation model that tackles these challenges. MaCo explores masked contrastive learning to simultaneously achieve fine-grained image understanding and zero-shot learning for a variety of medical imaging tasks. It designs a correlation weighting mechanism to adjust the correlation between masked chest X-ray image patches and their corresponding reports, thereby enhancing the model’s representation learning capabilities. To evaluate the performance of MaCo, we conducted extensive experiments using 6 well-known open-source X-ray datasets. The experimental results demonstrate the superiority of MaCo over 10 state-of-the-art approaches across tasks such as classification, segmentation, detection, and phrase grounding. These findings highlight the significant potential of MaCo in advancing a wide range of medical image analysis tasks.

Published in Nature Communications

ISSN: 2041-1723 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/ncomms/

About the journal