Partial Galaxy ToolConfig to DocBook CmdSynopsis conversion with XSLT RegEx
<tool id="sam_to_bam" name="SAM-to-BAM" version="1.1.1">
<description>converts SAM format to BAM format</description>
<requirements>
<requirement type="package">samtools</requirement>
</requirements>
<command interpreter="python">
sam_to_bam.py
--input1=$source.input1
--dbkey=${input1.metadata.dbkey}
#if $source.index_source == "history":
--ref_file=$source.ref_file
#else
--ref_file="None"
#end if
--output1=$output1
--index_dir=${GALAXY_DATA_INDEX_DIR}
</command>
<inputs>
<conditional name="source">
<param name="index_source" type="select" label="Choose the source for the reference list">
<option value="cached">Locally cached</option>
<option value="history">History</option>
</param>
<when value="cached">
<param name="input1" type="data" format="sam" label="SAM File to Convert">
<validator type="unspecified_build" />
<validator type="dataset_metadata_in_file" filename="sam_fa_indices.loc" metadata_name="dbkey" metadata_column="1" message="Sequences are not currently available for the specified build." line_startswith="index" />
</param>
</when>
<when value="history">
<param name="input1" type="data" format="sam" label="Convert SAM file" />
<param name="ref_file" type="data" format="fasta" label="Using reference file" />
</when>
</conditional>
</inputs>
<outputs>
<data format="bam" name="output1" label="${tool.name} on ${on_string}: converted BAM" />
</outputs>
</xml>
… you see that in the command tag, the actual syntax of the command is specified in a kind of “free text” format … This might not be exactly what one might think to use XSLT transformations for, but together with the regex functionality in XSLT 2.0 you definitely has this option too. Helped by this article on xml.com, I put together this little XSLT stylesheet for parsing up the free text content of that command tag (haven’t got to the more detailed config inside the inputs-tag in the galaxy format, but might not need either, if staying with the galaxy format anyway):
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" indent="yes" encoding="UTF-8" />
<xsl:template match="/">
<cmdsynopsis>
<xsl:apply-templates select="tool/command" />
</cmdsynopsis>
</xsl:template>
<xsl:template match="tool/command">
<command>
<xsl:value-of select="@interpreter" />
</command>
<xsl:for-each select='tokenize(
replace(
replace(
replace(
replace(
.,
"[ ]+",
""),
"\n#[^\s]+",
""),
"\n+",
" "),
"(^\s+|\s+$)",
""),
"\s")'>
<xsl:if test='matches(.,"\{")!=true()'>
<arg>
<xsl:value-of select='replace(.,"=.*","")'></xsl:value-of>
<xsl:if test='matches(.,".*=.*")'>
<xsl:text> </xsl:text>
<replaceable>
<xsl:value-of select='replace(.,".*=\s*\$?","")'></xsl:value-of>
</replaceable>
</xsl:if>
</arg>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
… a bit crazy with all these nested regex replace function calls, no? :) … but, I can tell you, it actually works very good! Found it easier to work with than many other regex implementations (i.e. matching newlines could be done with “\n”, which I think you can’t do by default in some other ones).
I can also mention that the tokenize function splits a string into an “array” of the parts between the parts that is matched by the expression given to tokenize (similar to “split” in some other languages, like python).
The result of the transoformation? Here it goes:
<?xml version="1.0" encoding="UTF-8"?>
<cmdsynopsis>
<command>python</command>
<arg>sam_to_bam.py</arg>
<arg>--input1 <replaceable>source.input1</replaceable>
</arg>
<arg>--ref_file <replaceable>source.ref_file</replaceable>
</arg>
<arg>--ref_file <replaceable>"None"</replaceable>
</arg>
<arg>--output1 <replaceable>output1</replaceable>
</arg>
</cmdsynopsis>
Not perfect (there are double “–ref_file” arguments still), but at least it has parsed up the different arguments, removed some galaxy specific stuff (the parts enclosed by “{}”) and the conditional statements. At least I think it shows that xslt + regex is actually an option, don’t you think? :)
A caveat here though: I found out that most of the XSLT processor tools for Ubuntu (xsltproc, xalan, the one built into php5) don’t accept XSLT 2.0 features such as regex, so I ended up using the java based saxon processor .
To call it for doing a transformation, you simply go (when using the open source “home edition”):
java -jar saxon9he.jar [xml-file] [xslt-file] > [output-file]
Works good! (does a good job of formatting the XML too).