Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK, NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK, NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK, NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK, NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK, Department of Medical and Molecular Genetics, School of Basic and Medical Biosciences, King’s College London, London, UK, Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK, Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK, NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK, NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
Amazon Simple Storage Service (Amazon S3) is a widely used platform for storing large biomedical datasets. Unintended data alterations can occur during data writing and transmission, altering the original content and generating unexpected results. However, no open-source and easy-to-use tool exists to verify end-to-end data integrity. Here, we present aws-s3-integrity-check, a user-friendly, lightweight, and reliable bash tool to verify the integrity of a dataset stored in an Amazon S3 bucket. Using this tool, we only needed ∼114 min to verify the integrity of 1,045 records ranging between 5 bytes and 10 gigabytes and occupying ∼935 gigabytes of the Amazon S3 cloud. Our aws-s3-integrity-check tool also provides file-by-file on-screen and log-file-based information about the status of each integrity check. To our knowledge, this tool is the only open-source one that allows verifying the integrity of a dataset uploaded to the Amazon S3 Storage quickly, reliably, and efficiently. The tool is freely available for download and use at https://github.com/SoniaRuiz/aws-s3-integrity-check and https://hub.docker.com/r/soniaruiz/aws-s3-integrity-check.