Genome Biology (May 2019)

Genomics and data science: an application within an umbrella

  • Fábio C. P. Navarro,
  • Hussein Mohsen,
  • Chengfei Yan,
  • Shantao Li,
  • Mengting Gu,
  • William Meyerson,
  • Mark Gerstein

DOI
https://doi.org/10.1186/s13059-019-1724-1
Journal volume & issue
Vol. 20, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Data science allows the extraction of practical insights from large-scale data. Here, we contextualize it as an umbrella term, encompassing several disparate subdomains. We focus on how genomics fits as a specific application subdomain, in terms of well-known 3 V data and 4 M process frameworks (volume-velocity-variety and measurement-mining-modeling-manipulation, respectively). We further analyze the technical and cultural “exports” and “imports” between genomics and other data-science subdomains (e.g., astronomy). Finally, we discuss how data value, privacy, and ownership are pressing issues for data science applications, in general, and are especially relevant to genomics, due to the persistent nature of DNA.