Cell Genomics (Jan 2024)

Opportunities for basic, clinical, and bioethics research at the intersection of machine learning and genomics

  • Shurjo K. Sen,
  • Eric D. Green,
  • Carolyn M. Hutter,
  • Mark Craven,
  • Trey Ideker,
  • Valentina Di Francesco

Journal volume & issue
Vol. 4, no. 1
p. 100466

Abstract

Read online

Summary: The data-intensive fields of genomics and machine learning (ML) are in an early stage of convergence. Genomics researchers increasingly seek to harness the power of ML methods to extract knowledge from their data; conversely, ML scientists recognize that genomics offers a wealth of large, complex, and well-annotated datasets that can be used as a substrate for developing biologically relevant algorithms and applications. The National Human Genome Research Institute (NHGRI) inquired with researchers working in these two fields to identify common challenges and receive recommendations to better support genomic research efforts using ML approaches. Those included increasing the amount and variety of training datasets by integrating genomic with multiomics, context-specific (e.g., by cell type), and social determinants of health datasets; reducing the inherent biases of training datasets; prioritizing transparency and interpretability of ML methods; and developing privacy-preserving technologies for research participants’ data.