CODECHECK

Independent execution of computations underlying research articles

Stephen J Eglen

University of Cambridge

Daniel Nüst

TU Dresden

April 13, 2026

Declarations

Affiliate editor of bioRxiv; editorial board of Gigabyte.

Acknowledgements

Mozilla mini science grant, UK Software Sustainability Institute, NWO. Editors @ Gigascience, eLife, Scientific Data.

British Neuroscience Award Team Credibility Prize (2024).

Slides

HTML slides (CC BY 4.0) are available at https://tinyurl.com/cdchk2604 (Grant McDermott).

What started this?

CODECHECK in one slide

  1. We take your paper, code and datasets.

  2. We run your code on your data.

  3. If our results match your results, go to step 5.

  4. Else we talk to you to find out where code broke. If you fix your code or data, we return to step 2 and try again.

  5. We write a report summarising that we could reproduce your finding.

  6. We work with you to freely share your paper, code, data and our reproduction.

Premise


We should be sharing material on the left, not the right.

“Paper as advert for Scholarship” (Buckheit & Donoho, 1995)

CODEOCEAN

Live demo

https://codeocean.com/explore?query=Nature%20Neuroscience&page=1&filter=all&refine=journal

e.g. let’s see: Neural basis of concurrent deliberation toward a choice and confidence judgment - public (Christopher R Fetsch & Miguel Vivar-Lazo)

direct link

Concerns

  • As author: can take a long time to establish reproducible capsule.

  • As reader: Everything runs in cloud, rather than your computer. Free cloud compute is likely to be very limited.

  • Some jobs may take weeks to run…

The CODECHECK philosophy

  • Systems like Code Ocean set the bar high by “making code reproducible forever for everyone”.

  • CODECHECK simply asks “was the code reproducible once for someone else?”

  • We check the code generates expected number of output files.

  • The contents of those output files are not checked, but are available for others to see.

  • The validity of the code is not checked.

  • What does it mean for two results to be “the same”?

Dimensions for codechecks

Where is the neuroscience?

  • This is not neuro-specific; we work across disciplines.

  • Depending on your project, there may be data to analyse, or simulations to run.

  • We did several reproductions of Covid papers, including the Imperial “Report 9” model.

Case study: regulation of pupil size in natural vision across the human lifespan (Lazar et al 2024)

Who does the work?

  1. AUTHOR provides code/data and instructions on how to run.

  2. CODECHECKER runs code and writes certificate.

  3. PUBLISHER oversees process, helps depositing artifacts, and persistently publishes certificate.

Who benefits?

  1. AUTHOR gets early check that “code works”; gets snapshot of code archived and increased trust in stability of results.

  2. CODECHECKER gets insight in latest research and methods, credit from community, and citable object.

  3. PUBLISHER Gets citable certificate with code/data bundle to share and increases reputation of published articles.

  4. PEER REVIEWERS can see certificate rather than check code themselves.

  5. READER Can check certificate and build upon work immediately.

Our register of certificates

https://codecheck.org.uk/register/

See for example certificate 2020-010 (Imperial’s “Report 9”).

Figure 1

Common errors

  1. File / path names hard-coded, or assume a particular platform (windows/linux).

  2. Scripts not suitable for direct execution.

  3. Lack of a README.

  4. External libraries/packages.

  5. Operating system environment.

However, touch wood, I have yet to fail in a reproduction.

Limitations

  1. CODECHECKER time is valuable, so needs credit.

  2. Very easy to cheat the system, but who cares?

  3. Author’s code/data must be freely available.

  4. Deliberately low threshold for gaining a certificate.

  5. High-performance compute is a resource drain.

  6. Cannot (yet) support all thinkable/existing workflows and languages.

Next steps

  1. Embedding into journal workflows.

  2. Training a community of codecheckers.

  3. Funding for a codecheck editor.

  4. Integration into ORCID / Pubmed Central.

  5. Building on institutional data repositories, e.g.  https://www.tudelft.nl/digital-competence-centre/services/reproducibility-check

  6. Automated testing over time (“when did my code break?”).

Come and get involved.

Further information: http://codecheck.org.uk and our research article.