PLoS Computational Biology (Sep 2022)
A simple kit to use computational notebooks for more openness, reproducibility, and productivity in research
Abstract
The ubiquitous use of computational work for data generation, processing, and modeling increased the importance of digital documentation in improving research quality and impact. Computational notebooks are files that contain descriptive text, as well as code and its outputs, in a single, dynamic, and visually appealing file that is easier to understand by nonspecialists. Traditionally used by data scientists when producing reports and informing decision-making, the use of this tool in research publication is not common, despite its potential to increase research impact and quality. For a single study, the content of such documentation partially overlaps with that of classical lab notebooks and that of the scientific manuscript reporting the study. Therefore, to minimize the amount of work required to manage all the files related to these contents and optimize their production, we present a starter kit to facilitate the implementation of computational notebooks in the research process, including publication. The kit contains the template of a computational notebook integrated into a research project that employs R, Python, or Julia. Using examples of ecological studies, we show how computational notebooks also foster the implementation of principles of Open Science, such as reproducibility and traceability. The kit is designed for beginners, but at the end we present practices that can be gradually implemented to develop a fully digital research workflow. Our hope is that such minimalist yet effective starter kit will encourage researchers to adopt this practice in their workflow, regardless of their computational background. Author summary The Open Science movement has been gaining track in recent years by reinforcing the bigger impact that collaborative research has: the more publicly available research there is, the easier it is to trust and build upon it. A key feature of effectively “available” and reusable research is being well documented, so it can be easily understood by those who need it. However, well documenting scientific work can be a daunting task and scientists may fall prey to workloads that are too heavy and possibly inefficient, if they are not familiar with the tools available for it. At the same time, since most research is conducted with at least one computational element (e.g., data analysis or storage of data in digital databases), the time is ripe to learn methods of documenting computational work. In this guide, we provide a minimal yet versatile set up to help scientists conduct and document their research in a more understandable, shareable, and impactful way.