Measuring the reproducibility and quality of Hi-C data

Galip Gürkan Yardımcı; Hakan Ozadam; Michael E. G. Sauria; Oana Ursu; Koon-Kiu Yan; Tao Yang; Abhijit Chakraborty; Arya Kaul; Bryan R. Lajoie; Fan Song; Ye Zhan; Ferhat Ay; Mark Gerstein; Anshul Kundaje; Qunhua Li; James Taylor; Feng Yue; Job Dekker; William S. Noble

doi:10.1186/s13059-019-1658-7

Genome Biology (Mar 2019)

Measuring the reproducibility and quality of Hi-C data

Galip Gürkan Yardımcı,
Hakan Ozadam,
Michael E. G. Sauria,
Oana Ursu,
Koon-Kiu Yan,
Tao Yang,
Abhijit Chakraborty,
Arya Kaul,
Bryan R. Lajoie,
Fan Song,
Ye Zhan,
Ferhat Ay,
Mark Gerstein,
Anshul Kundaje,
Qunhua Li,
James Taylor,
Feng Yue,
Job Dekker,
William S. Noble

Affiliations

Galip Gürkan Yardımcı: Department of Genome Sciences, University of Washington
Hakan Ozadam: Program in Systems Biology, University of Massachusetts Medical School
Michael E. G. Sauria: Biology Department, Johns Hopkins University
Oana Ursu: Department of Genetics, Stanford University
Koon-Kiu Yan: Department of Computational Biology, St. Jude Children’s Research Hospital
Tao Yang: Bioinformatics and Genomics Program, Huck Institutes of the Life Sciences, Penn State University
Abhijit Chakraborty: Computational Biology Division, La Jolla Institute for Allergy and Immunology
Arya Kaul: Computational Biology Division, La Jolla Institute for Allergy and Immunology
Bryan R. Lajoie: Program in Systems Biology, University of Massachusetts Medical School
Fan Song: Bioinformatics and Genomics Program, Huck Institutes of the Life Sciences, Penn State University
Ye Zhan: University of Massachusetts Medical School
Ferhat Ay: Computational Biology Division, La Jolla Institute for Allergy and Immunology
Mark Gerstein: Program in Computational Biology and Bioinformatics, Yale University
Anshul Kundaje: Department of Genetics, Stanford University
Qunhua Li: Department of Statistics, Penn State University
James Taylor: Biology Department, Johns Hopkins University
Feng Yue: Bioinformatics and Genomics Program, Huck Institutes of the Life Sciences, Penn State University
Job Dekker: Program in Systems Biology, University of Massachusetts Medical School
William S. Noble: Department of Genome Sciences, University of Washington

DOI: https://doi.org/10.1186/s13059-019-1658-7
Journal volume & issue: Vol. 20, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Background Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. Results Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. Conclusions In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.

Published in Genome Biology

ISSN: 1474-760X (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Genetics
Website: https://genomebiology.biomedcentral.com/

About the journal