BMC Bioinformatics (Mar 2024)
Comparative study on chromatin loop callers using Hi-C data reveals their effectiveness
Abstract
Abstract Background Chromosome is one of the most fundamental part of cell biology where DNA holds the hierarchical information. DNA compacts its size by forming loops, and these regions house various protein particles, including CTCF, SMC3, H3 histone. Numerous sequencing methods, such as Hi-C, ChIP-seq, and Micro-C, have been developed to investigate these properties. Utilizing these data, scientists have developed a variety of loop prediction techniques that have greatly improved their methods for characterizing loop prediction and related aspects. Results In this study, we categorized 22 loop calling methods and conducted a comprehensive study of 11 of them. Additionally, we have provided detailed insights into the methodologies underlying these algorithms for loop detection, categorizing them into five distinct groups based on their fundamental approaches. Furthermore, we have included critical information such as resolution, input and output formats, and parameters. For this analysis, we utilized the GM12878 Hi-C datasets at 5 KB, 10 KB, 100 KB and 250 KB resolutions. Our evaluation criteria encompassed various factors, including memory usages, running time, sequencing depth, and recovery of protein-specific sites such as CTCF, H3K27ac, and RNAPII. Conclusion This analysis offers insights into the loop detection processes of each method, along with the strengths and weaknesses of each, enabling readers to effectively choose suitable methods for their datasets. We evaluate the capabilities of these tools and introduce a novel Biological, Consistency, and Computational robustness score ( $$BCC_{score}$$ B C C score ) to measure their overall robustness ensuring a comprehensive evaluation of their performance.
Keywords