Frontiers in Medicine (Aug 2024)

Seeing the primary tumor because of all the trees: Cancer type prediction on low-dimensional data

  • Julia Gehrmann,
  • Devina Johanna Soenarto,
  • Kevin Hidayat,
  • Maria Beyer,
  • Lars Quakulinski,
  • Samer Alkarkoukly,
  • Samer Alkarkoukly,
  • Scarlett Berressem,
  • Scarlett Berressem,
  • Anna Gundert,
  • Anna Gundert,
  • Michael Butler,
  • Michael Butler,
  • Ana Grönke,
  • Simon Lennartz,
  • Thorsten Persigehl,
  • Thomas Zander,
  • Thomas Zander,
  • Oya Beyan,
  • Oya Beyan,
  • Oya Beyan

DOI
https://doi.org/10.3389/fmed.2024.1396459
Journal volume & issue
Vol. 11

Abstract

Read online

The Cancer of Unknown Primary (CUP) syndrome is characterized by identifiable metastases while the primary tumor remains hidden. In recent years, various data-driven approaches have been suggested to predict the location of the primary tumor (LOP) in CUP patients promising improved diagnosis and outcome. These LOP prediction approaches use high-dimensional input data like images or genetic data. However, leveraging such data is challenging, resource-intensive and therefore a potential translational barrier. Instead of using high-dimensional data, we analyzed the LOP prediction performance of low-dimensional data from routine medical care. With our findings, we show that such low-dimensional routine clinical information suffices as input data for tree-based LOP prediction models. The best model reached a mean Accuracy of 94% and a mean Matthews correlation coefficient (MCC) score of 0.92 in 10-fold nested cross-validation (NCV) when distinguishing four types of cancer. When considering eight types of cancer, this model achieved a mean Accuracy of 85% and a mean MCC score of 0.81. This is comparable to the performance achieved by approaches using high-dimensional input data. Additionally, the distribution pattern of metastases appears to be important information in predicting the LOP.

Keywords