Frontiers in Genetics (Sep 2021)

Identification of Prognostic Biomarker Candidates Associated With Melanoma Using High-Dimensional Genomic Data

  • Brody Kutt,
  • Brody Kutt,
  • Rachel Burdorf,
  • Rachel Burdorf,
  • Travaughn Bain,
  • Nicardo Cameron,
  • Alexia Pearah,
  • Ersoy Subasi,
  • David J. Carroll,
  • David J. Carroll,
  • Lisa K. Moore,
  • Munevver Mine Subasi

DOI
https://doi.org/10.3389/fgene.2021.707105
Journal volume & issue
Vol. 12

Abstract

Read online

Survival of patients with metastatic melanoma varies widely. Melanoma is a highly proliferative, chemo-resistant disease. With the recent availability of immunotherapies such as checkpoint inhibitors, durable response rates have improved but are often still limited to 2–3 years. Response rates to treatment range from 30 to 45% with combination therapy however no improvement in overall survival is frequently observed. Of the available therapies, many have targeted the BRAFV600E mutation that results in abnormal MAPK pathway activation which is important for regulating cell proliferation. Immune checkpoint inhibitors such as anti-PD-1 and anti-PD-L1 offer better success but response rates are still low. Identifying biomarkers to better target those who will respond and identify the right combination of treatment is the best approach. In this study, we utilize data from the Cancer Cell Line Encyclopedia (CCLE), including 62 samples, to examine features of gene expression (19K+) and copy number (20K+) in the melanoma cell lines. We perform a clustering analysis on the feature set to assess genetically similarity among the cell lines. We then discover which specific genes and combinations thereof maximize cluster density. We design a feature selection approach for high-dimensional datasets that integrates multiple disparate machine learning techniques into one cohesive pipeline. Our approach provides a small subset of genes that can accurately distinguish between the clusters of melanoma cell lines across multiple types of classifiers. In particular, we find only the 15 highest ranked genes among the original 19 K are necessary to achieve perfect or near-perfect test split classification performance. Of these 15 genes, some are known to be linked to melanoma or other cancer progressions, while others have not previously been linked to melanoma and are of interest for further examination.

Keywords