GitHub is an effective platform for collaborative and reproducible laboratory research

Chen KY, Toro-Moreno M , Subramaniam AR arXiv 10.48550/arXiv.2408.09344

Abstract

Laboratory research is a complex, collaborative process that involves several stages, including hypothesis formulation, experimental design, data generation and analysis, and manuscript writing. Although reproducibility and data sharing are increasingly prioritized at the publication stage, integrating these principles at earlier stages of laboratory research has been hampered by the lack of broadly applicable solutions. Here, we propose that the workflow used in modern software development offers a robust framework for enhancing reproducibility and collaboration in laboratory research. In particular, we show that GitHub, a platform widely used for collaborative software projects, can be effectively adapted to organize and document all aspects of a research project's lifecycle in a molecular biology laboratory. We outline a three-step approach for incorporating the GitHub ecosystem into laboratory research workflows: 1. designing and organizing experiments using issues and project boards, 2. documenting experiments and data analyses with a version control system, and 3. ensuring reproducible software environments for data analyses and writing tasks with containerized packages. The versatility, scalability, and affordability of this approach make it suitable for various scenarios, ranging from small research groups to large, cross-institutional collaborations. Adopting this framework from a project's outset can increase the efficiency and fidelity of knowledge transfer within and across research laboratories. An example GitHub repository based on the above approach is available at https://github.com/rasilab/github_demo.