Scientific Data (Jul 2022)

HistoML, a markup language for representation and exchange of histopathological features in pathology images

  • Peiliang Lou,
  • Chunbao Wang,
  • Ruifeng Guo,
  • Lixia Yao,
  • Guanjun Zhang,
  • Jun Yang,
  • Yong Yuan,
  • Yuxin Dong,
  • Zeyu Gao,
  • Tieliang Gong,
  • Chen Li

DOI
https://doi.org/10.1038/s41597-022-01505-0
Journal volume & issue
Vol. 9, no. 1
pp. 1 – 12

Abstract

Read online

Abstract The study of histopathological phenotypes is vital for cancer research and medicine as it links molecular mechanisms to disease prognosis. It typically involves integration of heterogenous histopathological features in whole-slide images (WSI) to objectively characterize a histopathological phenotype. However, the large-scale implementation of phenotype characterization has been hindered by the fragmentation of histopathological features, resulting from the lack of a standardized format and a controlled vocabulary for structured and unambiguous representation of semantics in WSIs. To fill this gap, we propose the Histopathology Markup Language (HistoML), a representation language along with a controlled vocabulary (Histopathology Ontology) based on Semantic Web technologies. Multiscale features within a WSI, from single-cell features to mesoscopic features, could be represented using HistoML which is a crucial step towards the goal of making WSIs findable, accessible, interoperable and reusable (FAIR). We pilot HistoML in representing WSIs of kidney cancer as well as thyroid carcinoma and exemplify the uses of HistoML representations in semantic queries to demonstrate the potential of HistoML-powered applications for phenotype characterization.