Current open research practice in Computational Biology
Stephen J Eglen
Current open research practice
in Computational Biology
Stephen J Eglen Cambridge Computational Biology Institute
https://sje30.github.io University of Cambridge
sje30@cam.ac.uk @StephenEglen
Slides: http://bit.ly/eglen_liber2016
LIBER 2016 Workshop 8: Making open the default
Acknowledgements
Danny Kingsley; Laurent Gatto (slides); Scott Chamberlain (rcrossref).
These slides are available under a creative common CC-BY license.
Making open the default
- What should be “open by default”? Papers or more?
- In UK, government & research councils have effectively already made open the default for papers in science.
- But sharing papers is just the start.
Inverse problems are hard
70-100 |
A |
60-69 |
B |
50-59 |
C |
40-49 |
D |
0-39 |
F |
Forward problem
I scored 68, what was my grade?
Inverse problem
I got a B, what was my score?
Research sharing: the inverse problem
Ethics and value of sharing research
“Moral” reasons to share research products
Being told to do something without seeing the benefit:
- Funding mandates
- e.g. requests from “ResearchFish”
- “Do unto others as you would have them do unto you”
Time committment to get metadata and share.
Giving away competitive edge.
Why bother?
Selfish reasons to share
Why not align what is good for science with what is good for scientists?
- Funding mandates (REF + enforcement from Wellcome Trust)
- Credit through data papers
- Leads to further collaborations (e.g. “EPAmeadev”)
- Fixes data bugs / errors in analysis
- Avoid the data loss (Vines et al 2014). e.g. students have a habit of leaving…
- Your future self is probably one of the main beneficiaries of sharing.
What to share?
- In some fields, we can share not just data files …
- Share code that analyses data and generates figures/tables
- Share entire computing environment through virtual environments (Docker).
- Bioconductor is a huge success in computational biology (Gentleman et al. 2004) (Huber et al 2015).
- Reproducible research often comes “for free” with advanced software enviroments, such as R or python (knitr, jupyter).
- e.g. Eglen et al (2014) Gigascience paper.
How to encourage sharing
- We train our students in reproducible research.
- When reviewing papers I usually need to ask “Is data/code available?”.
- Working with funders and publishers to encourage sharing. preprint.
Current status in the life sciences
Problems
- Life scientists still beholden to career-defining “Prestige journals”
- Author Processing Charge free-for-all driven by what market can bear
- As such, stuck in hybrid OA universe
Positives
- “Data” papers give credit for sharing
- Lower-cost solutions: PeerJ, F1000 Research, Ubiquity Press
- PLOS well-established, eLife
- #ASAPbio preprints on the rise …
Rise of biorxiv
Can submit directly to several journals.
cf 9000 new submissions/month to arxiv
The view from the UK
- Not all scientists want bulk deals. We want open deals.
- Bulk deals may be convenient, but hard to move away from.
- Return to individual subscriptions?
Markowetz maxims for reproducibility
- Reproducibility helps avoid disaster (Potti)
- Reproducibility makes it easier to write papers
- Reproducibility helps reviewers see it your way (Pouzat)
- Reproducibility enables continuity of your ideas
- Reproducibility helps to build your reputation (“nothing to hide”)