Living Systems_

Luigi

Tutorial: Luigi for Scientific Workflows

This is a Luigi tutorial I held at the e-Infrastructures for Massively parallel sequencing workshop (Video archive ) at SciLifeLab Uppsala in January 2015, moved here for future reference. What is Luigi? Luigi is a batch workflow system written in Python and developed by Erik Bernhardson and others at Spotify , where it is used to compute machine-learning powered music recommendation lists, top lists etc. Luigi is one of not-too-many batch workflow systems that supports running both normal command line jobs and Hadoop jobs in the same (in this tutorial, we will focus only on the command line part).

Wanted: Dynamic workflow scheduling

Photo credits: Matthew Smith / Unsplash In our work on automating machine learning computations in cheminformatics with scientific workflow tools , we have came to realize something; Dynamic scheduling in scientific workflow tools is very important and sometimes badly needed. What I mean is that new tasks should be able to be scheduled during the execution of a workflow, not just in its scheduling phase. What is striking is that far from all workflow tools allow this.

Workflow tool makers: Allow defining data flow, not just task dependencies

Upsurge in workflow tools There seem to be a little upsurge in light-weight - often python-based - workflow tools for data pipelines in the last couple of years: Spotify’s Luigi , OpenStack’s Mistral , Pinterest’s Pinball , and recently AirBnb’s Airflow , to name a few. These are all interesting tools, and it is an interesting trend for us at pharmbio , who try to see how we can use workflow tools to automate bio- and cheminformatics tasks on compute clusters.

Links: Our experiences using Spotify's Luigi for Bioinformatics Workflows

Fig 1: A screenshot of Luigi’s web UI, of a real-world (although rather simple) workflow implemented in Luigi: Update May 5, 2016: Most of the below material is more or less outdated. Our latest work has resulted in the SciLuigi helper library , which we have used in production and will be focus of further developments. In the Bioclipse / Pharmaceutical Bioinformatics group at Dept of Pharm. Biosciences att UU, we are quite heavy users of Spotify’s Luigi workflow library , to automate workflows, mainly doing Machine Learning heavy lifting.