Living Systems_

Go

Rewrite of Scicommander in Go with much improved algorithm

When I presented a poster about SciCommander at the Swedish bioinformatics workshop last year, I got a lot of awesome feedback from some great people including Fredrik Boulund, Johannes Alneberg and others, of which I unfortunately lost the names (please shout out if you read this!). (For those new to SciCommander, it is my attempt at creating a tool that can track complete provenance reports also for ad-hoc shell commands, not just those included in a pipeline.

Why didn't Go get a breakthrough in bioinformatics (yet)?

As we are - according to some expert opinions - living in the Century of Biology, I found it interesting to reflect on Go’s usage within the field. Go has some great features that make it really well suited for biology, such as: A relatively simple language that can be learned in a short time even for people without a CS background. This is super important aspect for biologists. Fantastic support for cross-compilation into all major computer architectures and operating systems, as static, self-sufficient executables making it extremely simple to deploy tools, something that can’t be said about the currently most popular bio language, Python.

Crystal: Go-like concurrency with easier syntax

I have been playing around a lot with concurrency in Go over the years, resulting in libraries such as SciPipe , FlowBase and rdf2smw . My main motivation for looking into Go has been the possibility to use it as a more performant, scaleable and type-safe alternative to Python for data heavy scripting tasks in bioinformatics and other fields I’ve been dabbling in. Especially as it makes it so easy to write concurrent and parallel code in it.

Viewing Go test coverage in the browser with one command

Go has some really nice tools for running tests and analyzing code. One of these functionalities is that you can generate coverage information when running tests, that can later be viewed in a browser using the go tool cover command. It turns out though, since doing it requires executing multiple commands after each other, it might be hard to remember the exact commands. To this end, I created a bash alias that does everything in one command, gocov.

Structured Go-routines or framework-less Flow-Based Programming inĀ Go

I was so happy the other day to find someone else who found the great benefits of a little pattern for how to structure pipeline-heavy programs in Go, which I described in a few posts before. I have been surprised to not find more people using this kind of pattern, which has been so extremely helpful to us, so I thought to take this opportunity to re-iterate it again, in the hopes that more people might get aware of it.

Parsing DrugBank XML (or any large XML file) in streaming mode in Go

I had a problem in which I thought I needed to parse the full DrugBank dataset, which comes as a (670MB) XML file (For open access papers describing DrugBank, see: [1], [2], [3] and [4]). It turned out what I needed was available as CSV files under “Structure External Links ”. There is probably still some other uses of this approach though, as the XML version of DrugBank seems to contain a lot more information in a single format.

Equation-centric dataflow programming in Go

Mathematical notation and dataflow programming Even though computations done on computers are very often based on some type of math, it is striking that the notation used in math to express equations and relations is not always very readily converted into programming code. Outside of purely symbolic programming languages like sage math or the (proprietary) Wolfram language , there seem to always be quite a divide between the mathematical notation and the numerical implementation.

Go is growing in bioinformatics workflow tools

TL;DR: We wrote a post on gopherdata.io, about the growing ecosystem of Go-based workflow tools in bioinformatics. Go read it here It is interesting to note how Google’s Go programming language seems to increase in popularity in bioinformatics. Just to give a sample of some of the Go based bioinformatics tools I’ve stumbled upon, there is since a few years back, the biogo library , providing common functionality for bioinformatics tasks.

(Almost) ranging over multiple Go channels simultaneously

Thus, optimally, one would want to use Go’s handy range keyword for looping over multiple channels, since range takes care of closing the for-loop at the right time (when the inbound channel is closed). So something like this (N.B: non-working code!): for a, b, c := range chA, chB, chC { doSomething(a, b, c) } Unfortunately this is not possible, and probably for good reason (how would it know whether to close the loop when the first, or all of the channels are closed?

First production run with SciPipe - A Go-based scientific workflow tool

Today marked the day when we ran the very first production workflow with SciPipe , the Go -based scientific workflow tool we’ve been working on over the last couple of years. Yay! :) This is how it looked (no fancy GUI or such yet, sorry): The first result we got in this very very first job was a list of counts of ligands (chemical compounds) in the ExcapeDB dataset (download here ) interacting with the 44 protein/gene targets identified by Bowes et al as a good baseline set for identifying hazardous side-effects effects in the body (that is, any chemical compounds binding these proteins, will never become an approved drug).

Notes on launching kubernetes jobs from the Go API

This post is also published on medium My current work at pharmb.io entails adding kubernetes support to my light-weight Go-based scientific workflow engine, scipipe (kubernetes, or k8s for short, is Google’s open source project for orchestrating container based compute clusters), which should take scipipe from a simple “run it on your laptop” workflow system with HPC support still in the work, to something that can power scientific workflows on any set of networked computers that can run kubernetes, which is quite a few (AWS, GCE, Azure, your Raspberry Phi cluster etc etc).

Combining the best of Go, D and Rust?

I’ve been following the development of D , Go and Rust (and also FreePascal for some use cases ) for some years (been into some benchmarking for bioinfo tasks ), and now we finally have three (four, with fpc) stable statically compiled languages with some momentum behind them, meaning they all are past 1.0. While I have went with Go for current projects , I still have a hard time “totally falling in love” with any single of these languages.

How I would like to write Go programs

Some time ago I got a post published on GopherAcademy , outlining in detail how I think a flow-based programming inspired syntax can strongly help to create clearer, easier-to-maintain, and more declarative Go programs. These ideas have since became clearer, and we (Ola Spjuth ’s research group at pharmbio ) have successfully used them to make the workflow syntax for Luigi (Spotify’s great workflow engine by Erik Bernhardsson & co) workflows easier, as implemented in the SciLuigi helper library .

Patterns for composable concurrent pipelines in Go

I realize I didn’t have a link to my blog on Gopher Academy , on patterns for compoasable concurrent pipelines in Go(lang), so here it goes: blog.gopheracademy.com/composable-pipelines-pattern

The smallest pipeable go program

Edit: My original suggested way further below in the post is no way the “smallest pipeable” program, instead see this example (Credits: Axel Wagner ): package main import ( "io" "os" ) func main() { io.Copy(os.Stdout, os.Stdin) } … or (credits: Roger Peppe ): package main import ( "bufio" "fmt" "os" ) func main() { for scan := bufio.NewScanner(os.Stdin); scan.Scan(); { fmt.Printf("%s\n", scan.Text()) } } Ah, I just realized that the “smallest pipeable” Go (lang) program is rather small, if using my little library of minimalistic streaming components .

Profiling and creating call graphs for Go programs

In trying to get my head around the code of the very interesting GoFlow library, (for flow-based programming in Go), and the accompanying flow-based bioinformatics library I started hacking on, I needed to get some kind of visualization (like a call graph) … something like this: (And in the end, that is what I got … read on … ) :) I then found out about the go tool pprof command, for which the Go team published a blog post on here .