Cancer Informatics (Feb 2024)

An Intelligent Search & Retrieval System (IRIS) and Clinical and Research Repository for Decision Support Based on Machine Learning and Joint Kernel-based Supervised Hashing

  • David J Foran,
  • Wenjin Chen,
  • Tahsin Kurc,
  • Rajarshi Gupta,
  • Jakub Roman Kaczmarzyk,
  • Luke Austin Torre-Healy,
  • Erich Bremer,
  • Samuel Ajjarapu,
  • Nhan Do,
  • Gerald Harris,
  • Antoinette Stroup,
  • Eric Durbin,
  • Joel H Saltz

DOI
https://doi.org/10.1177/11769351231223806
Journal volume & issue
Vol. 23

Abstract

Read online

Large-scale, multi-site collaboration is becoming indispensable for a wide range of research and clinical activities in oncology. To facilitate the next generation of advances in cancer biology, precision oncology and the population sciences it will be necessary to develop and implement data management and analytic tools that empower investigators to reliably and objectively detect, characterize and chronicle the phenotypic and genomic changes that occur during the transformation from the benign to cancerous state and throughout the course of disease progression. To facilitate these efforts it is incumbent upon the informatics community to establish the workflows and architectures that automate the aggregation and organization of a growing range and number of clinical data types and modalities ranging from new molecular and laboratory tests to sophisticated diagnostic imaging studies. In an attempt to meet those challenges, leading health care centers across the country are making steep investments to establish enterprise-wide, data warehouses. A significant limitation of many data warehouses, however, is that they are designed to support only alphanumeric information. In contrast to those traditional designs, the system that we have developed supports automated collection and mining of multimodal data including genomics, digital pathology and radiology images. In this paper, our team describes the design, development and implementation of a multi-modal, Clinical & Research Data Warehouse (CRDW) that is tightly integrated with a suite of computational and machine-learning tools to provide actionable insight into the underlying characteristics of the tumor environment that would not be revealed using standard methods and tools. The System features a flexible Extract, Transform and Load (ETL) interface that enables it to adapt to aggregate data originating from different clinical and research sources depending on the specific EHR and other data sources utilized at a given deployment site.