Encouraging code sharing in academia
Stephen J Eglen
Encouraging code sharing in academia
Stephen J Eglen Cambridge Computational Biology Institute
https://sje30.github.io University of Cambridge
sje30@cam.ac.uk @StephenEglen
Slides: http://bit.ly/eglen2017-1
Acknowledgements
Co-authors, Freeman lab, Laurent Gatto.
These slides are available under a creative common CC-BY license.
Inverse problems are hard
70-100 |
A |
60-69 |
B |
50-59 |
C |
40-49 |
D |
0-39 |
F |
Forward problem
I scored 68, what was my grade?
Inverse problem
I got a B, what was my score?
Research sharing: the inverse problem
Where is the scholarship?
An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and that complete set of instructions that generated the figures.
[Buckheit and Donoho 1995, after Claerbout]
Moral or selfish approach?
Selfish reasons to share
Why not align what is good for science with what is good for scientists?
- Funding mandates (REF + enforcement from Wellcome Trust)
- Credit through data papers
- Leads to further collaborations (e.g. “EPAmeadev”)
- Fixes data bugs / errors in analysis
- Prevent data loss (Vines et al 2014). e.g. students have a habit of leaving…
- Your future self is probably one of the main beneficiaries of sharing.
- Now is a very good time to be an open scientist.
Code sharing: a way forward
Specific recommendations
- Include enough code to reproduce key figure/result from your paper (“modeldb”).
- Provide toy examples if your project is too intensive to expect others to run in a few hours.
- Version control (github)
- Licence (MIT)
- Provide data
- Provide tests
- Use standards
- Use permanent URLs (Zenodo/figshare)
Simple example
Docker
Can bundle entire open-source evironment for others to share:
(start docker)
docker run -d -p 8787:8787 sje30/eglen2015
open http://192.168.99.100:8787/
This should launch a web page …
Jupyter notebooks
- Embed code within manuscript; figures/tables dynamically regenerated
binder = Docker + jupyter + cloud compute
- mybinder.org developed and supported by Freeman lab, Janelia Farm.
- Allows jupyter notebooks to be dynamically evaluated (not just rendered) online.
Find a code buddy
- We ask our students to submit a .Rnw file rather than a pdf. You get a zero if I can’t compile the pdf.
- So, ask someone else if they can run your code.
Third most important file in github repo
(After Arfon Smith)
- First: LICENSE
- Second: README.md
- Third: ???
Makefile
Learn Make if you don’t know it already.
Practical tips
- Lobby journals about their code-sharing practices.
- Lobby funders likewise.
- When reviewing articles, ask for code to be made available.
- When starting on a new project, assume code will be public at some point in the future.
Summary
- Find the selfish reasons to make your research reproducible.
- Adopt good practices to help you on your way.
- Writing code in groups can be very motivating.
- Use new tech if you want, but old tech works too.