Troubleshooting Nextflow pipelines
We have been evaluating Nextflow before in my work at pharmb.io , but that was before DSL2 and the support for re-usable modules (which was one reason we needed to develop our own tools to support our challenges, as explained in the paper ). Thus, there’s definitely some stuff to get into.
Based on my years in bioinformatics and data science, I’ve seen that the number one skill that you need to develop is to be able to effectively troubleshoot things, because things will invariably fail in all kinds of ways. And in the process, you will probably learn a lot about the technology stack you are using.
You need to be pretty intentional about developing this skill though, as it is a skill that is seldomly taught properly in undergraduate programs, if ever. This means it is far too often treated as a trial-and-error practice, perhaps combined with some informed guessing. While this can work, there are far more efficient ways.
Anyways, this is why I’m writing this post. I wanted to document some key troubleshooting techniques I have just picked up for working with Nextflow. Some is available in the Documentation, but I didn’t find a coherent summary of them all, and I also learned some further tricks through experimentation that can perhaps be contributed somewhere else later. But to start with, they are published below. Feel free to give feedback, and also let me know about your own favorite tips in the comments!
Existing troubleshooting resources
As said, Nextflow has some information on troubleshooting tools and techniques, and the idea here is not to re-iterate them, so I’ll start with pointing out some of the most important ones here.
- Perhaps the most similar page to this one, is the troubleshooting guide on the Nextflow Training website.
- There is also some info on avialable debugging tools in the overview page in the main Nextflow docs.
- The doc page on Tracing & visualisation provides info on some very useful tools to understand what your pipeline does.
- The workflow introspection page also contains tips on how you can inspect especially the objects in the DSL/Groovy part of the pipeline.
- A search for “troubleshooting” in the Nextflow docs gives a few pages specific to various execution platforms , as well as on caching and resuming , which should be generally useful.
- Mahesh Binzer-Panchal started an issue to collect various common errors in Nextflow .
- This seems to be now collected into this website with gotchas and common errors .
- Apart from the Slack , there is now also a community forum , where you can search for previous issues and ask about your own.
- Last but definitely not least, the nf-core community contains a super-vibrant community, with an even more active Slack server, and lots of other community resources.
Some further troubleshooting tips
1. The execution log
Primarily, Nextflow provides some pretty good tips itself, when an error occurs.
The first thing to note, is if you get your terminal filled with error
output, and don’t manage to read it all before it flushes by, you can
always find the latest log in a file named .nextflow.log
inside your
execution folder.
To scroll through this file in a searchable way, without wrapping long
lines, I recommend to use the less -S
command:
less -S ~/.nextflow.log
Using less, you can search by typing /
followed by the search phrase,
step through the search results with n
for next find, and N
for
previous find, scroll using arrows or PageDown/PageUp, or vim commands
for the same, as well as quit using q
.
Sometimes it is easier to just “grep” through the file, for example
searching case-insensitively for “error”, and perhaps piping that to
less -S
:
grep -i error .nextflow.log | less -S
2. The work folder
At least for jobs executed locally, in the execution log mentioned
above, Nextflow typically points out the path to the “work folder”.
Depending on your configuration, it typically involves the word “work”
and ends with a long random string of letters and numbers called a
“hash”, for example /tmp/nf/work/a3/0e2ea68c421ced4797e00de9e73155
.
It is a good idea to cd into this folder and exploring it, when you have
a hard to track down-bug, so in this example:
cd /tmp/nf/work/a3/0e2ea68c421ced4797e00de9e73155
ls -1a
The -1a
flags, or in particular -1tra
are very useful, and have the
following function:
-1
(the number) lists the files vertically instead of horizontally, making it easier to read. If you also want more details like timestamps and permissions, use-l
(the letter) instead.-t
sorts the list by time-r
reverses the the time-based sorting from-t
, so that the last files are located last (if the list is long, these will be the only ones immediately visible on the screen)-a
lists “all” files, meaning also hidden files, that is, those starting with a.
in the name.
Since the -t
and -r
flags are not strictly needed, we will skip them
below though for simplicity.
As you will see, the folder contains a number of hidden files, viewable
only with the -a
flag to ls
, that contain some key information on
how the job is being executed:
$ ls -1a
.
..
.command.begin
.command.err
.command.log
.command.out
.command.run
.command.sh
.exitcode
<some more files not relevant here>
These files are very useful to acquaint yourself more with. Their function is, in summary:
.command.begin
.command.err
- Everything written to stderr from the command. Will typically be errors, but some commands are ill-behaved and write out other stuff here as well, that might be potentially useful for debugging..command.log
- Log output from the command.command.out
- Everything written to stdout.command.run
- This is a scaffold script that contains various bash functions to create temporary folder, stage files, execute the .command.sh script, and clean up. We will look closer at it below..command.sh
- This script contains the main command run by the task in question. It is thus very useful to check that it looks as expected..exitcode
- This file will contain the exit code, or “returncode” of the command. It is an integer value with various meanings that can be looked up on the internet, but the most important thing to know is that a 0 means the command has completed successfully, and anything else (Typically 1 or above) means it has failed in some way.
3. Debugging a command in the work folder
If you have a tricky failing task that you don’t really understand why it fails, it might be a good idea to manually execute task scripts.
You will typically need to do that by executing the .command.run
script, which in turn executes .command.sh
, since the former will
create the temporary directories, stage files etc that is needed for the
task to run properly.
So, you would do:
bash .command.run
… and watch for any detailed output that might give you hints.
Make all commands visible with set -x
The problem with the above is that the .command.run
script does a lot
of “magic” and setup grunt work that you don’t see.
Thus, to make everything it does more clear, what you can do is to add
the commands set -x
in the top of both the .command.run
and
.command.sh
scripts, using a text editor (the nano terminal based text
is available on most systems, and is more user-friendly than vim for new
users: nano .command.run
… save with Ctrl+W
, and exit with
Ctrl+X
).
Then you can execute it again:
bash .command.run
But even better is, if you pipe all the output to a file, so that you can later read this at your own pace:
bash .command.run &> out.log
(The &
in &>
will make sure that both stdout and stderr are
redirected to the file)
Even better, is to BOTH redirect to a file, but also pipe it to
something like less -S
, so you can see and scroll the output
immediately:
bash .command.run |& tee out.log | less -S
Here, the |&
will pipe both stderr and stdout to the next command,
which less tee
, which will both take a filename where it writes its
output, and also forward the output to the next command, which is here
less -S
.
4. Turning off the cleanup parts, to explore temporary folders
One caveat when running the .command.run
script is that it will always
clean up temporary folders after it is finished. This means certain
subtle errors might be harder to detect, since you can’t explore these
temporary folders manually.
One way to get around this though is to comment out those parts in the
.command.run file before running it (In bash, you can comment it out by
adding a #
- character at the beginning of the line).
In particular, check the on_exit()
function, which might look like so:
on_exit() {
exit_status=${nxf_main_ret:=$?}
printf $exit_status > /tmp/nf/work/a3/0e2ea68c421ced4797e00de9e73155/.exitcode
set +u
[[ "$tee1" ]] && kill $tee1 2>/dev/null
[[ "$tee2" ]] && kill $tee2 2>/dev/null
[[ "$ctmp" ]] && rm -rf $ctmp || true
rm -rf $NXF_SCRATCH || true
sync || true
exit $exit_status
}
Here, you could comment our for example the
rm -rf $NXF_SCRATCH || true
line, if you want the temporary folder to
remain existing (typically put in /tmp
and named something like
/tmp/nxf.XXXXXXXXX
. If you add set -x
to the beginning of the script
as explained above, you should be able to see the exact path of this
one, when executing the command).
You can also have a look around the nxf_main()
, and the
nxf_launch()
functions. nxf_main()
is located in the bottom of the
script and is the over-arching function, calling the other sub-functions
(the on_exit()
is not explicitly called though, but is set up to be
called whenever the nxf_main()
function is returning or interrupted),
while nxf_launch()
is the one executing the .command.sh script,
together with some environment variables, setup for containers etc,
which is a somewhat common source of some errors.
Its content should be visible in the script output when you add set -x
to the script and run it, but it might also be good to examine it
manually!
If you want to be able to quickly enable these two adjustments when in a
work folder, you can add the following bash function to your
~/.bash_aliases
file:
function debugnf() {
sed -i '2s/^/set -x\n/' .command.{run,sh};
sed -i 's/rm /#rm /g' .command.run;
}
Then, when in a work folder, you can just run debugnf
before running
any of the .command.run or .command.sh files manually.
Summary
Hope you were able to learn something from the tips in this post! And, perhaps you know some further great tips for debugging? Feel free to share them, or at least a link to them, in the comments below!
Changelog
- 2023-11-01 13:37 CET: Added section “Turning off the cleanup parts, to explore temporary folders”
- 2023-11-01 14:20 CET: Added pointer about nxf_launch(), on tip from Maxime Garcia .
- 2023-11-01 21:30 CET: Added link to troubleshooting guide at the Nextflow Training website.