Genome Biology (Apr 2023)
The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles
- Jacob Schreiber,
- Carles Boix,
- Jin wook Lee,
- Hongyang Li,
- Yuanfang Guan,
- Chun-Chieh Chang,
- Jen-Chien Chang,
- Alex Hawkins-Hooker,
- Bernhard Schölkopf,
- Gabriele Schweikert,
- Mateo Rojas Carulla,
- Arif Canakoglu,
- Francesco Guzzo,
- Luca Nanni,
- Marco Masseroli,
- Mark James Carman,
- Pietro Pinoli,
- Chenyang Hong,
- Kevin Y. Yip,
- Jeffrey P. Spence,
- Sanjit Singh Batra,
- Yun S. Song,
- Shaun Mahony,
- Zheng Zhang,
- Wuwei Tan,
- Yang Shen,
- Yuanfei Sun,
- Minyi Shi,
- Jessika Adrian,
- Richard Sandstrom,
- Nina Farrell,
- Jessica Halow,
- Kristen Lee,
- Lixia Jiang,
- Xinqiong Yang,
- Charles Epstein,
- J. Seth Strattan,
- Bradley Bernstein,
- Michael Snyder,
- Manolis Kellis,
- William Stafford,
- Anshul Kundaje,
- ENCODE Imputation Challenge Participants
Affiliations
- Jacob Schreiber
- Stanford University School of Medicine
- Carles Boix
- Stanford University School of Medicine
- Jin wook Lee
- Stanford University School of Medicine
- Hongyang Li
- Stanford University School of Medicine
- Yuanfang Guan
- Stanford University School of Medicine
- Chun-Chieh Chang
- Stanford University School of Medicine
- Jen-Chien Chang
- Stanford University School of Medicine
- Alex Hawkins-Hooker
- Stanford University School of Medicine
- Bernhard Schölkopf
- Stanford University School of Medicine
- Gabriele Schweikert
- Stanford University School of Medicine
- Mateo Rojas Carulla
- Stanford University School of Medicine
- Arif Canakoglu
- Stanford University School of Medicine
- Francesco Guzzo
- Stanford University School of Medicine
- Luca Nanni
- Stanford University School of Medicine
- Marco Masseroli
- Stanford University School of Medicine
- Mark James Carman
- Stanford University School of Medicine
- Pietro Pinoli
- Stanford University School of Medicine
- Chenyang Hong
- Stanford University School of Medicine
- Kevin Y. Yip
- Stanford University School of Medicine
- Jeffrey P. Spence
- Stanford University School of Medicine
- Sanjit Singh Batra
- Stanford University School of Medicine
- Yun S. Song
- Stanford University School of Medicine
- Shaun Mahony
- Stanford University School of Medicine
- Zheng Zhang
- Stanford University School of Medicine
- Wuwei Tan
- Stanford University School of Medicine
- Yang Shen
- Stanford University School of Medicine
- Yuanfei Sun
- Stanford University School of Medicine
- Minyi Shi
- Stanford University School of Medicine
- Jessika Adrian
- Stanford University School of Medicine
- Richard Sandstrom
- Stanford University School of Medicine
- Nina Farrell
- Stanford University School of Medicine
- Jessica Halow
- Stanford University School of Medicine
- Kristen Lee
- Stanford University School of Medicine
- Lixia Jiang
- Stanford University School of Medicine
- Xinqiong Yang
- Stanford University School of Medicine
- Charles Epstein
- Stanford University School of Medicine
- J. Seth Strattan
- Stanford University School of Medicine
- Bradley Bernstein
- Stanford University School of Medicine
- Michael Snyder
- Stanford University School of Medicine
- Manolis Kellis
- Stanford University School of Medicine
- William Stafford
- Stanford University School of Medicine
- Anshul Kundaje
- Stanford University School of Medicine
- ENCODE Imputation Challenge Participants
- Stanford University School of Medicine
- DOI
- https://doi.org/10.1186/s13059-023-02915-y
- Journal volume & issue
-
Vol. 24,
no. 1
pp. 1 – 22
Abstract
Abstract A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.