How to align columns in the terminal
In bioinformatics we are handling a lot of tabular data. Be it VCF files, tabular Blast output, or just creating a CSV or TSV samplesheet.
Actually, one of my favorite tabular formats is by using SeqKit to convert Fasta or FastQ files to tabular format, as this allows to do various filtering operations by row, using standard unix tools if so wished.
Scrolling through this type of data in the terminal can be messy to say the least though.
Although CSVs can of course be imported into a spreadsheet software for viewing, it would be very powerful to be able to view them comfortably right from the terminal, isn’t it?
To take one example that fits within the code window of a blog post, let’s take
a selected set of columns from the CSV output from the Mykrobe tool. And to make
it emulate another common problem with many csv formats, let’s also use tr to
convert the _:s in the headers into real spaces (Mykrobe does not do this,
but many other tools do):
$ cat SOME_SAMPLE.csv | cut -d, -f 2,3,10,14,15,17,18 | tr '_' ' ' > selection.csv
$ cat selection.csv
"drug","susceptibility","kmer size","phylo group per covg","species per covg","phylo group depth","species depth"
"Amikacin","S","21","99.672","98.428","372","347"
"Capreomycin","S","21","99.672","98.428","372","347"
"Ciprofloxacin","S","21","99.672","98.428","372","347"
"Delamanid","S","21","99.672","98.428","372","347"
"Ethambutol","S","21","99.672","98.428","372","347"
"Ethionamide","S","21","99.672","98.428","372","347"
"Isoniazid","R","21","99.672","98.428","372","347"
"Kanamycin","S","21","99.672","98.428","372","347"
"Levofloxacin","S","21","99.672","98.428","372","347"
"Linezolid","S","21","99.672","98.428","372","347"
"Moxifloxacin","S","21","99.672","98.428","372","347"
"Ofloxacin","S","21","99.672","98.428","372","347"
"Pyrazinamide","S","21","99.672","98.428","372","347"
"Rifampicin","S","21","99.672","98.428","372","347"
"Streptomycin","R","21","99.672","98.428","372","347"
As you can see, here it is really not that easy to even follow individual columns, much less link headers to columns.
To improve the situation here a bit, we can convert the commas to tabs, using
the tr command:
$ cat selection.csv | tr "," "\t"
"drug" "susceptibility" "kmer size" "phylo group per covg" "species per covg" "phylo group depth" "species depth"
"Amikacin" "S" "21" "99.672" "98.428" "372" "347"
"Capreomycin" "S" "21" "99.672" "98.428" "372" "347"
"Ciprofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Delamanid" "S" "21" "99.672" "98.428" "372" "347"
"Ethambutol" "S" "21" "99.672" "98.428" "372" "347"
"Ethionamide" "S" "21" "99.672" "98.428" "372" "347"
"Isoniazid" "R" "21" "99.672" "98.428" "372" "347"
"Kanamycin" "S" "21" "99.672" "98.428" "372" "347"
"Levofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Linezolid" "S" "21" "99.672" "98.428" "372" "347"
"Moxifloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Ofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Pyrazinamide" "S" "21" "99.672" "98.428" "372" "347"
"Rifampicin" "S" "21" "99.672" "98.428" "372" "347"
"Streptomycin" "R" "21" "99.672" "98.428" "372" "347"
Well, now we at least can clearly follow individual columns. But the headers don’t align that well.
To fix this, what we can do is use the column -t command, which will
straighten up the columns for us nicely:
$ cat selection.csv | tr ',' ' ' | column -t
"drug" "susceptibility" "kmer size" "phylo group per covg" "species per covg" "phylo group depth" "species depth"
"Amikacin" "S" "21" "99.672" "98.428" "372" "347"
"Capreomycin" "S" "21" "99.672" "98.428" "372" "347"
"Ciprofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Delamanid" "S" "21" "99.672" "98.428" "372" "347"
"Ethambutol" "S" "21" "99.672" "98.428" "372" "347"
"Ethionamide" "S" "21" "99.672" "98.428" "372" "347"
"Isoniazid" "R" "21" "99.672" "98.428" "372" "347"
"Kanamycin" "S" "21" "99.672" "98.428" "372" "347"
"Levofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Linezolid" "S" "21" "99.672" "98.428" "372" "347"
"Moxifloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Ofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Pyrazinamide" "S" "21" "99.672" "98.428" "372" "347"
"Rifampicin" "S" "21" "99.672" "98.428" "372" "347"
"Streptomycin" "R" "21" "99.672" "98.428" "372" "347"
As you can see, column -t does not handle this situation very well, as it is
separating columns based on plain spaces by default, something we have in our
headers in many real world files.
To fix this, what we can do is make sure that column -t only separates based
on real TAB characters. The problem with this is that specifying a proper TAB
character on the commandline is not always straight-forward. Some tools, such
as awk accepts a normal "\t" string for this, e.g. awk -F"\t" to tell awk
to use TAB as a field separator, but column doesn’t do that. Instead we need
to specify it using $'\t', which should work in most newer Linuxes.
So all in all, we can do:
$ cat selection.csv | tr ',' '\t' | column -t -s $'\t'
"drug" "susceptibility" "kmer size" "phylo group per covg" "species per covg" "phylo group depth" "species depth"
"Amikacin" "S" "21" "99.672" "98.428" "372" "347"
"Capreomycin" "S" "21" "99.672" "98.428" "372" "347"
"Ciprofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Delamanid" "S" "21" "99.672" "98.428" "372" "347"
"Ethambutol" "S" "21" "99.672" "98.428" "372" "347"
"Ethionamide" "S" "21" "99.672" "98.428" "372" "347"
"Isoniazid" "R" "21" "99.672" "98.428" "372" "347"
"Kanamycin" "S" "21" "99.672" "98.428" "372" "347"
"Levofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Linezolid" "S" "21" "99.672" "98.428" "372" "347"
"Moxifloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Ofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Pyrazinamide" "S" "21" "99.672" "98.428" "372" "347"
"Rifampicin" "S" "21" "99.672" "98.428" "372" "347"
"Streptomycin" "R" "21" "99.672" "98.428" "372" "347"
Since the terminal might sometimes be too small to view all the columns without breaking lines, I typically combine that with less -S:
$ cat selection.csv | tr ',' '\t' | column -t -s $'\t' | less -S
"drug" "susceptibility" "kmer size" "phylo group per covg" "species per covg" "phylo group depth" "species depth"
"Amikacin" "S" "21" "99.672" "98.428" "372" "347"
"Capreomycin" "S" "21" "99.672" "98.428" "372" "347"
"Ciprofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Delamanid" "S" "21" "99.672" "98.428" "372" "347"
"Ethambutol" "S" "21" "99.672" "98.428" "372" "347"
"Ethionamide" "S" "21" "99.672" "98.428" "372" "347"
"Isoniazid" "R" "21" "99.672" "98.428" "372" "347"
"Kanamycin" "S" "21" "99.672" "98.428" "372" "347"
"Levofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Linezolid" "S" "21" "99.672" "98.428" "372" "347"
"Moxifloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Ofloxacin" "S" "21" "99.672" "98.428" "372" "347"
"Pyrazinamide" "S" "21" "99.672" "98.428" "372" "347"
"Rifampicin" "S" "21" "99.672" "98.428" "372" "347"
"Streptomycin" "R" "21" "99.672" "98.428" "372" "347"
This is quite the mess of a command though, so I have of course created some aliases in my ~/.bash_aliases file:
alias colt="column -t -s $'\t'"
alias s='less -S'
So now, in these situations, I can just do:
$ cat selection.csv | tr ',' '\t' | colt | s
One caveat in this specific example needs to be added, and that is that column -t of course
can accept , as separator straight away, with column -t -s,, meaning that
you don’t necessarily need to convert commas to tabs in this specific
example. But tab separated output is common enough that you will find yourself
needing the commands above anyway.
Hope it helps!