Lessons learned and recommendations for data coordination in collaborative research: The CSER consortium experience
Kathleen D. Muenzen,
Laura M. Amendola,
Tia L. Kauffman,
Kathleen F. Mittendorf,
Jeannette T. Bensen,
Flavia Chen,
Richard Green,
Bradford C. Powell,
Mark Kvale,
Frank Angelo,
Laura Farnan,
Stephanie M. Fullerton,
Jill O. Robinson,
Tianran Li,
Priyanka Murali,
James M.J. Lawlor,
Jeffrey Ou,
Lucia A. Hindorff,
Gail P. Jarvik,
David R. Crosslin
Affiliations
Kathleen D. Muenzen
Department of Biomedical Informatics and Medical Education, Division of Biomedical and Health Informatics, University of Washington Medical Center, Seattle, WA, USA; Corresponding author
Laura M. Amendola
Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
Tia L. Kauffman
Center for Health Research, Kaiser Permanente Northwest, Portland, OR, USA
Kathleen F. Mittendorf
Center for Health Research, Kaiser Permanente Northwest, Portland, OR, USA
Jeannette T. Bensen
Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Flavia Chen
Institute for Human Genetics, University of California at San Francisco, San Francisco, CA, USA
Richard Green
Department of Biomedical Informatics and Medical Education, Division of Biomedical and Health Informatics, University of Washington Medical Center, Seattle, WA, USA
Bradford C. Powell
Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Mark Kvale
Institute for Human Genetics, University of California at San Francisco, San Francisco, CA, USA
Frank Angelo
Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
Laura Farnan
Lineberger Comprehensive Cancer Center, UNC School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Stephanie M. Fullerton
Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA, USA
Jill O. Robinson
Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, TX, USA
Tianran Li
Department of Biomedical Informatics and Medical Education, Division of Biomedical and Health Informatics, University of Washington Medical Center, Seattle, WA, USA
Priyanka Murali
Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
James M.J. Lawlor
HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
Jeffrey Ou
Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
Lucia A. Hindorff
Division of Genomic Medicine, NHGRI, NIH, Bethesda, MD, USA
Gail P. Jarvik
Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
David R. Crosslin
Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA, USA; Corresponding author
Summary: Integrating data across heterogeneous research environments is a key challenge in multi-site, collaborative research projects. While it is important to allow for natural variation in data collection protocols across research sites, it is also important to achieve interoperability between datasets in order to reap the full benefits of collaborative work. However, there are few standards to guide the data coordination process from project conception to completion. In this paper, we describe the experiences of the Clinical Sequence Evidence-Generating Research (CSER) consortium Data Coordinating Center (DCC), which coordinated harmonized survey and genomic sequencing data from seven clinical research sites from 2020 to 2022. Using input from multiple consortium working groups and from CSER leadership, we first identify 14 lessons learned from CSER in the categories of communication, harmonization, informatics, compliance, and analytics. We then distill these lessons learned into 11 recommendations for future research consortia in the areas of planning, communication, informatics, and analytics. We recommend that planning and budgeting for data coordination activities occur as early as possible during consortium conceptualization and development to minimize downstream complications. We also find that clear, reciprocal, and continuous communication between consortium stakeholders and the DCC is equally important to maintaining a secure and centralized informatics ecosystem for pooling data. Finally, we discuss the importance of actively interrogating current approaches to data governance, particularly for research studies that straddle the research-clinical divide.