A brief introduction to reproducible research and open science

Jan 19 2018

[The following was written for the British Neuroscience Association]

At first glance, it might seem odd that you would need to prefix the term “research” with the qualifier “reproducible”. Surely, once you have a paper in your hands, you have all the details in front of you to reproduce someone else’s work? That’s certainly the theory when writing the paper, but it’s often not the practice… Since 2004 we’ve set a problem for our masters students to reproduce key results from a paper within computational biology. Even though students carefully select a paper where the methods section seems comprehensive, and all the experimental data are available, they invariably find that many details are missing that preclude them from reproducing key figures or results. Many papers have been published on this failure to reproduce [e.g. 2], commonly termed the “reproducibility crisis” [1].

So, what might reproducible research entail? The definition can vary from group to group, but my interpretation is that when publishing results, labs should also provide all relevant datasets and methodology for transforming data into results. This means providing the spreadsheets or computational scripts to reproduce analysis. In turn, this means that researchers should move away from “point and click” analysis methodologies (doing a t-test in Excel) towards computer scripts (such as R, matlab or python) so that others can re-run the same routines.

So, this brings us naturally to the second term, open science. The competitive nature (for limited funding, jobs, and “high impact” publications) of science means that there is a natural tendency to withold key datasets or analysis technologies: why give away your results to your competitors? An alternative view gaining prominence in recent years is that by sharing our resources, we allow others to build on our work and science as a whole should benefit. By being an open scientist, there are increased chances of making your work reproducible.

Being an open scientist may seem naïve and altruistic, but there are selfish reasons for sharing your research [3]. Many funding agencies now require data management plans for sharing of data post publication, and journals are increasingly asking for data and methods. My optimistic hope is that in 10 years we might be able to drop the qualifier “open” and instead talk again simply about science.

Top tips for becoming an open scientist.

Read the guidelines in [3] and think if they would apply to you.
Read about peoples’ experiences such as Erin McKiernan.
Do experiments? Try writing a registered report before doing the experiments to reduce publication bias. https://www.nature.com/articles/s41562-016-0034
Talk to your local library to see what services they can offer to help archive and share your research. Find a local community of like-minded scientists!
Learn how to code, rather than using Excel, for your data analysis. e.g. http://www.datacarpentry.org

Comments? Send them to me via twitter @StephenEglen

References

[1]: Baker M (2016) 1,500 scientists lift the lid on reproducibility. Nature 533:452–454. http://dx.doi.org/10.1038/533452a

[2] Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2:e124. https://doi.org/10.1371/journal.pmed.0020124

[3] Markowetz F (2015) Five selfish reasons to work reproducibly. Genome Biol 16:274. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0850-7

Stephen J Eglen Computational Neuroscience

A brief introduction to reproducible research and open science