Best practice data life cycle approaches for the life sciences [version 2; referees: 2 approved]
Philippa C. Griffin,
Jyoti Khadake,
Kate S. LeMay,
Suzanna E. Lewis,
Sandra Orchard,
Andrew Pask,
Bernard Pope,
Ute Roessner,
Keith Russell,
Torsten Seemann,
Andrew Treloar,
Sonika Tyagi,
Jeffrey H. Christiansen,
Saravanan Dayalan,
Simon Gladman,
Sandra B. Hangartner,
Helen L. Hayden,
William W.H. Ho,
Gabriel Keeble-Gagnère,
Pasi K. Korhonen,
Peter Neish,
Priscilla R. Prestes,
Mark F. Richardson,
Nathan S. Watson-Haigh,
Kelly L. Wyres,
Neil D. Young,
Maria Victoria Schneider
Affiliations
Philippa C. Griffin
EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia
Jyoti Khadake
NIHR BioResource, University of Cambridge and Cambridge University Hospitals NHS Foundation Trust Hills Road, Cambridge , CB2 0QQ, UK
Kate S. LeMay
Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
Suzanna E. Lewis
Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Berkeley, CA, 94720, USA
Sandra Orchard
European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge, CB10 1SD, UK
Andrew Pask
School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
Bernard Pope
Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
Ute Roessner
Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
Keith Russell
Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
Torsten Seemann
Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
Andrew Treloar
Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
Sonika Tyagi
Australian Genome Research Facility Ltd, Parkville, VIC, 3052, Australia
Jeffrey H. Christiansen
Queensland Cyber Infrastructure Foundation and the University of Queensland Research Computing Centre, St Lucia, QLD, 4072, Australia
Saravanan Dayalan
Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
Simon Gladman
EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia
Sandra B. Hangartner
School of Biological Sciences, Monash University, Clayton, VIC, 3800, Australia
Helen L. Hayden
Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
William W.H. Ho
School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
Gabriel Keeble-Gagnère
School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
Pasi K. Korhonen
Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
Peter Neish
The University of Melbourne, Parkville, VIC, 3010, Australia
Priscilla R. Prestes
Faculty of Science and Engineering, Federation University Australia, Mt Helen , VIC, 3350, Australia
Mark F. Richardson
Bioinformatics Core Research Group & Centre for Integrative Ecology, Deakin University, Geelong, VIC, 3220, Australia
Nathan S. Watson-Haigh
School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, SA, 5064, Australia
Kelly L. Wyres
Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia
Neil D. Young
Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
Maria Victoria Schneider
Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
Throughout history, the life sciences have been revolutionised by technological advances; in our era this is manifested by advances in instrumentation for data generation, and consequently researchers now routinely handle large amounts of heterogeneous data in digital formats. The simultaneous transitions towards biology as a data science and towards a ‘life cycle’ view of research data pose new challenges. Researchers face a bewildering landscape of data management requirements, recommendations and regulations, without necessarily being able to access data management training or possessing a clear understanding of practical approaches that can assist in data management in their particular research domain. Here we provide an overview of best practice data life cycle approaches for researchers in the life sciences/bioinformatics space with a particular focus on ‘omics’ datasets and computer-based data processing and analysis. We discuss the different stages of the data life cycle and provide practical suggestions for useful tools and resources to improve data management practices.