Informatics in Medicine Unlocked (Jan 2019)

Representing oncology in datasets: Standard or custom biomedical terminology?

  • Stefan Schulz,
  • Philipp Daumke,
  • Martin Romacker,
  • Pablo López-García

Journal volume & issue
Vol. 15

Abstract

Read online

In this article we investigate whether a custom or an established biomedical terminology is recommended for coding cancer datasets. We first give an overview of biomedical terminology focused on the domain of cancer and introduce three clinical use cases to demonstrate how several cancer aspects can be coded using ICD-10, ICD-O, TNM, MeSH, NCIt, MedDRA, and SNOMED CT. The same collection of terminologies was used in a case study where two dimensions of cancer (anatomy and histology) had already been coded in a dataset using a custom terminology. Although our experiments were limited in terms of coders (2) and coding cases (250 in total, 50 double-coded), they showed that, in most cases, equivalent concepts already existed in standard biomedical terminologies. SNOMED CT and NCIt provided the highest coverage (88% vs. 93%), with NCIt showing a much higher agreement (unweighted Kappa of 71% vs. 49%). As a general conclusion, for annotating cancer datasets we recommend the use of standard terminologies and mappings to local interface terminologies or value sets, instead of building a custom terminology from scratch. Keywords: Biomedical terminology, Cancer, Annotation, SNOMED CT, NCIt