Translational Oncology (Mar 2021)

Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type

  • Jim Abraham,
  • Amy B. Heimberger,
  • John Marshall,
  • Elisabeth Heath,
  • Joseph Drabick,
  • Anthony Helmstetter,
  • Joanne Xiu,
  • Daniel Magee,
  • Phillip Stafford,
  • Chadi Nabhan,
  • Sourabh Antani,
  • Curtis Johnston,
  • Matthew Oberley,
  • Wolfgang Michael Korn,
  • David Spetzler

Journal volume & issue
Vol. 14, no. 3
p. 101016

Abstract

Read online

Cancer of Unknown Primary (CUP) occurs in 3–5% of patients when standard histological diagnostic tests are unable to determine the origin of metastatic cancer. Typically, a CUP diagnosis is treated empirically and has very poor outcomes, with median overall survival less than one year. Gene expression profiling alone has been used to identify the tissue of origin but struggles with low neoplastic percentage in metastatic sites which is where identification is often most needed. MI GPSai, a Genomic Prevalence Score, uses DNA sequencing and whole transcriptome data coupled with machine learning to aid in the diagnosis of cancer. The algorithm trained on genomic data from 34,352 cases and genomic and transcriptomic data from 23,137 cases and was validated on 19,555 cases. MI GPSai predicted the tumor type in the labeled data set with an accuracy of over 94% on 93% of cases while deliberating amongst 21 possible categories of cancer. When also considering the second highest prediction, the accuracy increases to 97%. Additionally, MI GPSai rendered a prediction for 71.7% of CUP cases. Pathologist evaluation of discrepancies between submitted diagnosis and MI GPSai predictions resulted in change of diagnosis in 41.3% of the time. MI GPSai provides clinically meaningful information in a large proportion of CUP cases and inclusion of MI GPSai in clinical routine could improve diagnostic fidelity. Moreover, all genomic markers essential for therapy selection are assessed in this assay, maximizing the clinical utility for patients within a single test.