Living Systems_

How to align columns in the terminal

In bioinformatics we are handling a lot of tabular data. Be it VCF files, tabular Blast output, or just creating a CSV or TSV samplesheet.

Actually, one of my favorite tabular formats is by using SeqKit to convert Fasta or FastQ files to tabular format, as this allows to do various filtering operations by row, using standard unix tools if so wished.

Scrolling through this type of data in the terminal can be messy to say the least though.

Although CSVs can of course be imported into a spreadsheet software for viewing, it would be very powerful to be able to view them comfortably right from the terminal, isn’t it?

To take one example that fits within the code window of a blog post, let’s take a selected set of columns from the CSV output from the Mykrobe tool. And to make it emulate another common problem with many csv formats, let’s also use tr to convert the _:s in the headers into real spaces (Mykrobe does not do this, but many other tools do):

$ cat SOME_SAMPLE.csv | cut -d, -f 2,3,10,14,15,17,18 | tr '_' ' ' > selection.csv
$ cat selection.csv
"drug","susceptibility","kmer size","phylo group per covg","species per covg","phylo group depth","species depth"
"Amikacin","S","21","99.672","98.428","372","347"
"Capreomycin","S","21","99.672","98.428","372","347"
"Ciprofloxacin","S","21","99.672","98.428","372","347"
"Delamanid","S","21","99.672","98.428","372","347"
"Ethambutol","S","21","99.672","98.428","372","347"
"Ethionamide","S","21","99.672","98.428","372","347"
"Isoniazid","R","21","99.672","98.428","372","347"
"Kanamycin","S","21","99.672","98.428","372","347"
"Levofloxacin","S","21","99.672","98.428","372","347"
"Linezolid","S","21","99.672","98.428","372","347"
"Moxifloxacin","S","21","99.672","98.428","372","347"
"Ofloxacin","S","21","99.672","98.428","372","347"
"Pyrazinamide","S","21","99.672","98.428","372","347"
"Rifampicin","S","21","99.672","98.428","372","347"
"Streptomycin","R","21","99.672","98.428","372","347"

As you can see, here it is really not that easy to even follow individual columns, much less link headers to columns.

To improve the situation here a bit, we can convert the commas to tabs, using the tr command:

$ cat selection.csv | tr "," "\t"
"drug"  "susceptibility"        "kmer size"     "phylo group per covg"  "species per covg"      "phylo group depth"     "species depth"
"Amikacin"      "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Capreomycin"   "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Ciprofloxacin" "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Delamanid"     "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Ethambutol"    "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Ethionamide"   "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Isoniazid"     "R"     "21"    "99.672"        "98.428"        "372"   "347"
"Kanamycin"     "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Levofloxacin"  "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Linezolid"     "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Moxifloxacin"  "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Ofloxacin"     "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Pyrazinamide"  "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Rifampicin"    "S"     "21"    "99.672"        "98.428"        "372"   "347"
"Streptomycin"  "R"     "21"    "99.672"        "98.428"        "372"   "347"

Well, now we at least can clearly follow individual columns. But the headers don’t align that well.

To fix this, what we can do is use the column -t command, which will straighten up the columns for us nicely:

$ cat selection.csv | tr ',' ' ' |  column -t
"drug"           "susceptibility"  "kmer  size"     "phylo    group  per    covg"  "species  per  covg"  "phylo  group  depth"  "species  depth"
"Amikacin"       "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Capreomycin"    "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Ciprofloxacin"  "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Delamanid"      "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Ethambutol"     "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Ethionamide"    "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Isoniazid"      "R"               "21"   "99.672"  "98.428"  "372"  "347"
"Kanamycin"      "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Levofloxacin"   "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Linezolid"      "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Moxifloxacin"   "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Ofloxacin"      "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Pyrazinamide"   "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Rifampicin"     "S"               "21"   "99.672"  "98.428"  "372"  "347"
"Streptomycin"   "R"               "21"   "99.672"  "98.428"  "372"  "347"

As you can see, column -t does not handle this situation very well, as it is separating columns based on plain spaces by default, something we have in our headers in many real world files.

To fix this, what we can do is make sure that column -t only separates based on real TAB characters. The problem with this is that specifying a proper TAB character on the commandline is not always straight-forward. Some tools, such as awk accepts a normal "\t" string for this, e.g. awk -F"\t" to tell awk to use TAB as a field separator, but column doesn’t do that. Instead we need to specify it using $'\t', which should work in most newer Linuxes.

So all in all, we can do:

$ cat selection.csv | tr ',' '\t' | column -t -s $'\t'
"drug"           "susceptibility"  "kmer size"  "phylo group per covg"  "species per covg"  "phylo group depth"  "species depth"
"Amikacin"       "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Capreomycin"    "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Ciprofloxacin"  "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Delamanid"      "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Ethambutol"     "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Ethionamide"    "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Isoniazid"      "R"               "21"         "99.672"                "98.428"            "372"                "347"
"Kanamycin"      "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Levofloxacin"   "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Linezolid"      "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Moxifloxacin"   "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Ofloxacin"      "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Pyrazinamide"   "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Rifampicin"     "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Streptomycin"   "R"               "21"         "99.672"                "98.428"            "372"                "347"

Since the terminal might sometimes be too small to view all the columns without breaking lines, I typically combine that with less -S:

$ cat selection.csv | tr ',' '\t' | column -t -s $'\t' | less -S
"drug"           "susceptibility"  "kmer size"  "phylo group per covg"  "species per covg"  "phylo group depth"  "species depth"
"Amikacin"       "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Capreomycin"    "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Ciprofloxacin"  "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Delamanid"      "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Ethambutol"     "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Ethionamide"    "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Isoniazid"      "R"               "21"         "99.672"                "98.428"            "372"                "347"
"Kanamycin"      "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Levofloxacin"   "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Linezolid"      "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Moxifloxacin"   "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Ofloxacin"      "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Pyrazinamide"   "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Rifampicin"     "S"               "21"         "99.672"                "98.428"            "372"                "347"
"Streptomycin"   "R"               "21"         "99.672"                "98.428"            "372"                "347"

This is quite the mess of a command though, so I have of course created some aliases in my ~/.bash_aliases file:

alias colt="column -t -s $'\t'"
alias s='less -S'

So now, in these situations, I can just do:

$ cat selection.csv | tr ',' '\t' | colt | s

One caveat in this specific example needs to be added, and that is that column -t of course can accept , as separator straight away, with column -t -s,, meaning that you don’t necessarily need to convert commas to tabs in this specific example. But tab separated output is common enough that you will find yourself needing the commands above anyway.

Hope it helps!