Scientific Data (Aug 2025)
Multimodal data curation via interoperability: use cases with the Medical Imaging and Data Resource Center
Abstract
Abstract Interoperability (the ability of data or tools from non-cooperating resources to integrate or work together with minimal effort) is particularly important for curation of multimodal datasets from multiple data sources. The Medical Imaging and Data Resource Center (MIDRC), a multi-institutional collaborative initiative to collect, curate, and share medical imaging datasets, has made interoperability with other data commons one of its top priorities. The purpose of this study was to demonstrate the interoperability between MIDRC and two other data repositories, BioData Catalyst (BDC) and National Clinical Cohort Collaborative (N3C). Using interoperability capabilities of the data repositories, we built two cohorts for example use cases, with each containing clinical and imaging data on matched patients. The representativeness of the cohorts is characterized by comparing with CDC population statistics using the Jensen-Shannon distance. The process and methods of interoperability demonstrated in this work can be utilized by MIDRC, BDC, and N3C users to create multimodal datasets for development of artificial intelligence/machine learning models.