Reproducible research

Encouraging code and data sharing
in neuroscience

Stephen J Eglen                  Cambridge Computational Biology Institute
https://sje30.github.io          University of Cambridge
sje30@cam.ac.uk                  @StephenEglen

Slides: http://bit.ly/eglen-futurepub

Acknowledgements

Software Sustainability Institute

These slides are available under a creative common CC-BY license.

Inverse problems are hard

Score (%)	grade
70-100	A
60-69	B
50-59	C
40-49	D
0-39	F

Forward problem

I scored 68, what was my grade?

Inverse problem

I got a B, what was my score?

Research sharing: the inverse problem

Where is the scholarship?

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and that complete set of instructions that generated the figures.

[Buckheit and Donoho 1995, after Claerbout]

Moral or selfish approach?

Paper

Selfish reasons to share

Why not align what is good for science with what is good for scientists?

Funding mandates (REF + enforcement from Wellcome Trust)
Credit through data papers
Leads to further collaborations (e.g. “EPAmeadev”)
Fixes data bugs / errors in analysis
Prevents data loss (Vines et al 2014). e.g. students have a habit of leaving…
Your future self is probably one of the main beneficiaries of sharing.
Now is a very good time to be an open scientist.

Code review pilot

10.1038/nn.4550

Specific recommendations

Include enough code to reproduce key figure/result from your paper (“modeldb”).
Provide toy examples if your project is too intensive to expect others to run in a few hours.
Version control (github)
Licence (MIT)
Provide data
Provide tests
Use standards
Use permanent URLs (Zenodo/figshare)

10.1038/nn.4550

Simple example

Paper Info

Docker

Can bundle entire open-source evironment for others to share:

(start docker)
docker run -d -p 8787:8787 sje30/eglen2015
open http://192.168.99.100:8787/

This should launch a web page …

A dirty secret

Docker does not ensure your code will work forever…
My reproducible review lasted about 6 months!
Continuous Intgregation would have helped me find when it broke.
Titus Brown reports similar: How I learned to stop worrying and love the coming archivability crisis in scientific software

Jupyter notebooks

Embed code within manuscript; figures/tables dynamically regenerated

https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

binder = Docker + jupyter + cloud compute

mybinder.org developed and supported by Freeman lab, Janelia Farm.
Allows jupyter notebooks to be dynamically evaluated (not just rendered) online.

https://github.com/sofroniewn/2pRAM-paper

binder other examples

Small example github repo

LIGO experiments

Find a code buddy

We ask our students to submit a .Rnw file rather than a pdf. They get a zero if I can’t compile the pdf.
So, ask someone else if they can run your code.

Third most important file in github repo

(After Arfon Smith)

First: LICENSE
Second: README.md
Third: ???

Makefile

Learn Make if you don’t know it already.

Practical tips

Lobby journals about their code-sharing practices.
Lobby funders likewise.
When reviewing articles, ask for code to be made available.
When starting on a new project, assume code will be public at some point in the future.

Reproducible papers

Lessons I learnt

Editors loved this.
Reviewers engaged, edited code/figures.
Brittle. Paper 2 broke within 6 months!

Challenges for a reproducible paper

Technical aspects mostly there. Docker/Rmd/Jupyter notebooks/Zenodo/github.
Sustainable compute platform required. mybinder.org was a victim of its own success. beta.mybinder.org Julia example
Long (hours upwards) compute jobs are clearly not interactive.
Social challenges much harder. How to incentivise/require authors to work like this?
Publisher workflow should work with the author worflow (that starts before and may continue after publisher workflow finishes).
Will reviewers be expected to do more than read the paper?

Elife developments

What would the workflow look like for an “executable” paper? Elife publishers, developers and users “solved” it:

Further info

Elife questionnaire

Survey on Demand for RR

Responders: Neuro (34%) Biochemistry (28%) Systems Biology (28%)
R markdown / Jupyter as leading technologies
Key wishes: accessing data/code, interacting with figures
~70% of responders have shared data/code

Desirable features for a journal

Pubmed
Open access
Low cost
Decent formatting
Publish reviews, possibly with “elife” consensus

Not yet convinced re: PPPR.

Overlay journals can work

Discrete Analysis launched in 2016 by Tim Gowers (Cambridge).

Strong editorial board and lead editor.
Traditional reviewing process.
Papers live on Arxiv.
Editor writes a summary of papers.
Low cost (currently free to authors).
Cambridge has worked through issues.

Would this work in Computational Neuroscience?

Format-free submissions

Source

Don’t format manuscripts: Brischoux & Legagneux

End embargo periods?

Ask editors why we can’t share AAM’s immediately upon acceptance.

Embargo periods suit publishers, not us.

Install unpaywall

http://unpaywall.org/

Summary

Share your code and data for everyone to beneft.
Learn the new tools, but don’t forget the old tools.
Support the journals that work with the community.
The biggest beneficiary of reproducible research may be your future self.

Reproducible research

Inverse problems are hard

Forward problem

Inverse problem

Where is the scholarship?

Moral or selfish approach?

Code review pilot

Specific recommendations

Simple example

New tools

Docker

A dirty secret

Jupyter notebooks

binder = Docker + jupyter + cloud compute

binder other examples

Old tools

Find a code buddy

Third most important file in github repo

Makefile

Practical tips

Reproducible papers

Reproducible papers

Lessons I learnt

Challenges for a reproducible paper

Elife developments

Elife questionnaire

Desirable features for a journal

Overlay journals can work

Things we can do tomorrow

Format-free submissions

End embargo periods?

Install unpaywall

Summary

Encouraging code and data sharing in neuroscience

Inverse problems are hard

Forward problem

Inverse problem

Research sharing: the inverse problem

Where is the scholarship?

Moral or selfish approach?

Selfish reasons to share

Code review pilot

Specific recommendations

Simple example

New tools

Docker

A dirty secret

Jupyter notebooks

binder = Docker + jupyter + cloud compute

binder other examples

Old tools

Find a code buddy

Third most important file in github repo

Makefile

Practical tips

Reproducible papers

Reproducible papers

Lessons I learnt

Challenges for a reproducible paper

Elife developments

Elife questionnaire

Desirable features for a journal

Overlay journals can work

Things we can do tomorrow

Format-free submissions

End embargo periods?

Install unpaywall

Summary

Encouraging code and data sharing
in neuroscience