Frontiers in Genetics (Sep 2020)

GenomeChronicler: The Personal Genome Project UK Genomic Report Generator Pipeline

  • José Afonso Guerra-Assunção,
  • José Afonso Guerra-Assunção,
  • Lucia Conde,
  • Ismail Moghul,
  • Amy P. Webster,
  • Simone Ecker,
  • Olga Chervova,
  • Christina Chatzipantsiou,
  • Pablo P. Prieto,
  • Stephan Beck,
  • Javier Herrero

DOI
https://doi.org/10.3389/fgene.2020.518644
Journal volume & issue
Vol. 11

Abstract

Read online

In recent years, there has been a significant increase in whole genome sequencing data of individual genomes produced by research projects as well as direct to consumer service providers. While many of these sources provide their users with an interpretation of the data, there is a lack of free, open tools for generating reports exploring the data in an easy to understand manner. GenomeChronicler was developed as part of the Personal Genome Project UK (PGP-UK) to address this need. PGP-UK provides genomic, transcriptomic, epigenomic and self-reported phenotypic data under an open-access model with full ethical approval. As a result, the reports generated by GenomeChronicler are intended for research purposes only and include information relating to potentially beneficial and potentially harmful variants, but without clinical curation. GenomeChronicler can be used with data from whole genome or whole exome sequencing, producing a genome report containing information on variant statistics, ancestry and known associated phenotypic traits. Example reports are available from the PGP-UK data page (personalgenomes.org.uk/data). The objective of this method is to leverage existing resources to find known phenotypes associated with the genotypes detected in each sample. The provided trait data is based primarily upon information available in SNPedia, but also collates data from ClinVar, GETevidence, and gnomAD to provide additional details on potential health implications, presence of genotype in other PGP participants and population frequency of each genotype. The analysis can be run in a self-contained environment without requiring internet access, making it a good choice for cases where privacy is essential or desired: any third party project can embed GenomeChronicler within their off-line safe-haven environments. GenomeChronicler can be run for one sample at a time, or in parallel making use of the Nextflow workflow manager. The source code is available from GitHub (https://github.com/PGP-UK/GenomeChronicler), container recipes are available for Docker and Singularity, as well as a pre-built container from SingularityHub (https://singularity-hub.org/collections/3664) enabling easy deployment in a variety of settings. Users without access to computational resources to run GenomeChronicler can access the software from the Lifebit CloudOS platform (https://lifebit.ai/cloudos) enabling the production of reports and variant calls from raw sequencing data in a scalable fashion.

Keywords