We need recipes for common bioinformatics tasks
Ad-hoc tasks in bioinformatics can contain such an immense number of operations and tasks that need to be performed to achieve a certain goal. Often these are all individually regarded as rather “standard” or “routine”. Despite this, it is quite hard to find an authoritative set of “recipes” for how to do such tasks.
Thus I was starting to think that there needs to be a collection of bioinformatics “recipes”. A sort of “cookbook” for common bioinformatics tasks.
It turns out that there are a number of such resources available though:
- First there is
bioinformatics.recipes
which
is quite much exactly the type of content I was after, except:
- The only thing I note about this one is it would be great if these were based on a version control like git, and hosted on a code hosting platform like GitHub, to make it easier to collectively both fork, clone and use the included code, but also contribute fixes or improvements to documentation.
- Then, somebody also mentioned this list of methods primers in the
Nature
journal
.
- The comment here is perhaps that this is more of high level information about various methods, which is also great, but not exactly the same thing as a recipes.
- Finally there is of course BioStars, which is a huge resource of
questions and answers over the last decades.
- A problem here is mainly that a lot of the questions (and answers) are quite old, perhaps 10+ years sometimes, which makes it hard to judge whether an answer is still relevant or not. And then the selection fo topics is also naturally limited to what actual questions people have had, which might not cover the problem area evently.
- Also, there are lists of common best-practice pipelines like the one
at NF-core
, which is a hugely useful
resource.
- But it covers more of high-level analyses, and does not typically provide pipelines for common smaller tasks like using blast to cut out a gene from a reference genome or similar.
With this in mind, is it perhaps time to create a bioinformatics recipe resource based on something like GitHub, where the community can crowd-source these recipes both in terms of code and documentation?
Or, is there already something like this available?
If not, below are some random ideas about how to do this:
- Having a common structure and template for documentation.
- Having a “driver script”, like run.sh that is always the same for each recipe.
- One could annotate the input and output files with their file types, in order to more easily see which recipes are possible to link together (this could even be presented on an accompanying webpage. E.g. recipes outputing a VCF (Variant Call Format) file could list suggestions for recipes to follow up with, based on the ones that can take VCF-files as input.