Current open research practice in Computational Biology

Current open research practice
in Computational Biology

Stephen J Eglen                  Cambridge Computational Biology Institute
https://sje30.github.io          University of Cambridge
sje30@cam.ac.uk                  @StephenEglen

Slides: http://bit.ly/eglen_liber2016

LIBER 2016 Workshop 8: Making open the default

Acknowledgements

Danny Kingsley; Laurent Gatto (slides); Scott Chamberlain (rcrossref).

These slides are available under a creative common CC-BY license.

Making open the default

What should be “open by default”? Papers or more?
In UK, government & research councils have effectively already made open the default for papers in science.
But sharing papers is just the start.

Inverse problems are hard

Score (%)	grade
70-100	A
60-69	B
50-59	C
40-49	D
0-39	F

Forward problem

I scored 68, what was my grade?

Inverse problem

I got a B, what was my score?

Research sharing: the inverse problem

Ethics and value of sharing research

“Moral” reasons to share research products

Being told to do something without seeing the benefit:

Funding mandates
e.g. requests from “ResearchFish”
“Do unto others as you would have them do unto you”

Time committment to get metadata and share.

Giving away competitive edge.

Why bother?

Moral or selfish?

Paper

Selfish reasons to share

Why not align what is good for science with what is good for scientists?

Funding mandates (REF + enforcement from Wellcome Trust)
Credit through data papers
Leads to further collaborations (e.g. “EPAmeadev”)
Fixes data bugs / errors in analysis
Avoid the data loss (Vines et al 2014). e.g. students have a habit of leaving…
Your future self is probably one of the main beneficiaries of sharing.

What to share?

In some fields, we can share not just data files …
Share code that analyses data and generates figures/tables
Share entire computing environment through virtual environments (Docker).
Bioconductor is a huge success in computational biology (Gentleman et al. 2004) (Huber et al 2015).
Reproducible research often comes “for free” with advanced software enviroments, such as R or python (knitr, jupyter).
e.g. Eglen et al (2014) Gigascience paper.

How to encourage sharing

We train our students in reproducible research.
When reviewing papers I usually need to ask “Is data/code available?”.
Working with funders and publishers to encourage sharing. preprint.

The publishing industry

Current status in the life sciences

Problems

Life scientists still beholden to career-defining “Prestige journals”
Author Processing Charge free-for-all driven by what market can bear
As such, stuck in hybrid OA universe

Positives

“Data” papers give credit for sharing
Lower-cost solutions: PeerJ, F1000 Research, Ubiquity Press
PLOS well-established, eLife
#ASAPbio preprints on the rise …

Rise of biorxiv

Can submit directly to several journals.

cf 9000 new submissions/month to arxiv

The view from the UK

Not all scientists want bulk deals. We want open deals.
Bulk deals may be convenient, but hard to move away from.
Return to individual subscriptions?

Markowetz maxims for reproducibility

Reproducibility helps avoid disaster (Potti)
Reproducibility makes it easier to write papers
Reproducibility helps reviewers see it your way (Pouzat)
Reproducibility enables continuity of your ideas
Reproducibility helps to build your reputation (“nothing to hide”)