Running Schematron: bat/shell, Ant, XProc

By Rick Jelliffe
February 20, 2009 | Comments: 14

[Updated: 2010-04-21 syntax fix.]

I thought I would write a little blog item about running Schematron (I mean the open source implementation at Schematron.com) in batch environments. I apologize in advance for the horrible formatting due to the tiny width of the blog design, which is not designed well for code!

Here are examples for command-lines, Ant and XProc.

BAT

Running a Schematron validation from a BAT file looks something like this:


XSLT -input=xxx.sch -output=xxx1.sch -stylesheet=iso_dsdl_include.xsl
XSLT -input=xxx1.sch -output=xxx2.sch -stylesheet=iso_abstract_expand.xsl
XSLT -input=xxx2.sch -output=xxx.xsl -stylesheet=iso_svrl.xsl
XSLT -input=document.xml -output=xxx-document.svrl -stylesheet=xxx.xsl

Every XSLT implementation has different command line parameters. RTM for details.

A pipelined version on a shell script would be something like this:

     XSLT -i xxx.sch  -s iso_dsdl_include.xsl  |\
     XSLT -s iso_abstract_expand.xsl     |\
     XSLT -o=xxx.xsl  -s=iso_svrl.xsl  
     XSLT -i=instance.xml  -o=instance.svrl  \
                 -s=xxx.xsl


ANT


Here is a typical kind of Ant script, for all four stages. You details may vary.

<target  name="schematron-compile-test" >

<!-- expand inclusions -->
<xslt basedir="test/schematron"
style="iso_dsdl_include.xsl" in="test.sch" out="test1.sch">
<classpath>
<pathelement location="${lib.dir}/saxon9.jar"/>
</classpath>
</xslt>

<!-- expand abstract patterns -->
<xslt basedir="test/schematron"
style="iso_abstract_expand.xsl" in="test1.sch" out="test2.sch">
<classpath>
<pathelement location="${lib.dir}/saxon9.jar"/>
</classpath>
</xslt>

<!-- compile it -->
<xslt basedir="test/schematron"
style="iso_svrl_for_xslt2.xsl" in="test2.sch" out="test.xsl">
<classpath>
<pathelement location="${lib.dir}/saxon9.jar"/>
</classpath>
</xslt>

<!-- validate -->
<xslt basedir="test/schematron"
style="test.xsl" in="instance.xml" out="instance.svrl">
<classpath>
<pathelement location="${lib.dir}/saxon9.jar"/>
</classpath>
</xslt>
</target>

There is a dedicated Schematron-task-for-ANT available, but it is currently being upgraded to cope with the latest release of Schematron. I hope to have that available in the next few days.

XProc

Finally, here is a stab at what four-stage Schematron validation would look like using XProc (I haven't tested this yet, so no flames).

Actually, XProc has a built-in process to perform Schematron validation in one step: p:validate-with-schematron but I thought it would be interesting to explicate.

<!-- A simple version of ISO Schematron -->
<p:pipeline   xmlns:p="http://www.w3.org/ns/xproc" name="schematron">

<p:input port="instance"/>
<p:input port="schema"/>

<p:output port="svrl" >
<p:pipe step="validate" port="result"/>
</p:output >

<p:xslt version="2.0" name="include">
<p:input port="source">
<p:pipe step="schematron" port="schema"/>
</p:input>
<p:input port="stylesheet">
<p:document href="iso_dsdl_include.xslt"/>
</p:input>
</p:xslt>


<p:xslt version="2.0" name="expand" >
<p:input port="source">
<p:pipe step="include" port="result"/>
</p:input>
<p:input port="stylesheet">
<p:document href="iso_abstract_expand.xslt"/>
</p:input>
</p:xslt>


<p:xslt version="2.0" name="compile">
<p:input port="source">
<p:pipe step="expand" port="result"/>
</p:input>
<p:input port="stylesheet">
<p:document href="iso_svrl_for_xslt2.xslt"/>
</p:input>
</p:xslt>


<p:xslt version="2.0" name="validate">
<p:input port="source">
<p:pipe step="schematron" port="instance"/>
</p:input>
<p:input port="stylesheet">
<p:pipe step="compile" port="result"/>
</p:input>
</p:xslt>

</pipeline>


You might also be interested in:

14 Comments

Rick,

Thanks for this post! Schematron is a great way to express business rules using simple XPath expressions. A real treasure. It is so much better then just XML Schema validation since the validation rules can be much more complex. Unlike XML Schema validation, the rule designer can also create much better human-readable error messages that can be customized to the user's context.

My hope is that we also have an XQuery-based tool in the near future with a nice XForms front end for creating, maintaining and testing Schematron rules. Using the built-in indexes of a native XML databases like eXist-db could create very large performance improvements for static XML files.

Do you see any hope of standardizing the ways that Schematron can be used in rule-engines in the future?

Thanks! - Dan

Dan: A process with two Schematron schemas, one to extract information from the input, the second to mark or grade that information, would be able to represent nicely many more complex rules declaratively. (And, indeed, if the schemas were parameterized with some kind of training mechanism, this would be a kind of neural net too!)

For example, the first Schematron generates SVRL, and the second Schematron generates SVRL augmented with properties or foreign elements giving the function invocations to process the results.

But when I look at most "rules" languages they have two characteristics: first, they tend to work on some domain-specfic abstraction (whereas Schematron works on the raw XML but lets you declare abstractions to some extent), and second that they have some "if ... then ..." structure where actions are allowed (whereas Schematron has a very limited "then" (i.e., messages, diagnostics, flags, soon properties I hope) which are all declarative.

For these kinds of rules, the W3C RIF is something that needs to be considered.

From the standardization POV, this is one of those fall-through-the-cracks issues, at the moment. We dare not standardize by making something up independent of real-world practice, but it may be one of those things where it cannot take off without leadership.

Hi Rick,

thanks for your ongoing work on Schematron. However we have a small problem with invoking the Schematron files.

In our schematron files, we sometimes have to refer to external documents, for example with codes or another kind of 'allowed values'. We want to make use of relative references to these files. For example:

document('./codelists/2009_codes.xml')

However, when we use Saxon as our XSLT processor it typically uses the working dir of the processor as reference point, instead of the location of the schematron file. So it starts looking for
JAVA_HOME/ext/lib/codelists/2009_codex.xml

You have already described this approach (and problem) in http://broadcast.oreilly.com/2008/11/validating-code-lists-with-sch.html

As we deploy our schematron files on various systems / platforms, we prefer the use of relative URI's.
A potential solution to avoid the use of absolute URI's might be to provide a parameter, which refers to the directory with the schematron files. This approach is described in: http://www.zvon.org/ZvonSW/saxonserver/Output/index.html

Then we should be able to use the parameter in our URI refering to the file with the codelist:

...
document('{$schematronDir}/codelists/2009_codes.xml')

However we are unable to use the xsl:param element ("Warning: unrecognized element xsl:param"), and the sch:param element seems to be used for other purposes (abstract patterns).

Can you advise use how to hand over a XSLT parameter to the schematron file?

With kind regards!


Holtkamp: Yes it is always a problem. I think XSLT2 provides some better facilities for rebasing, but it is still rather unsatisfactory.

You should be able to use a top-level <sch:let /> element.

sch:schema/sch:let are translated into xsl:param. Other sch:let are translated to xsl:variable.

You use the let to provide a default value, and a parameter to the final generated XSLT provides the value. I.e. the value is a validation parameter for each instance, not compiled into the generated stylesheet.

Rick, thanks for your swift response and solution.

Eventually the following approach worked:

<sch:schema ...>
<!-- location of the codelists, defaults to 'codelists', can be overruled by commandline parameter: 'schematronDir=codelists2' -->
<sch:let name="schematronDir" value="'codelists'"/>
...

<sch:rule ...>
<sch:assert test="document(concat($schematronDir,'/2009_codes.xml'))/xpathGoesHere" />
</sch:rule
</sch:schema>


style="iso_dsdl_include.xsl" in="test/ccd.sch" out="test1.sch">




style="iso_abstract_expand.xsl" in="test1.sch" out="test2.sch">




style="iso_svrl_for_xslt2.xsl" in="test2.sch" out="test.xsl">






style="test.xsl" in="test/SampleCCDDocument.xml" out="instance.svrl">





I used above build to run, but during compile it stage i ended up with net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException: Content is not allowed in prolog. I noticed expand abstract patterns stage output file is having the content in the prolog without any root tag, can i know what correction i need to do.

Thanks,
Laxmikanth.S

Sorry, I cannot tell from your example.

The most like reason for this is that your Schematron schema is not well-formed. Sometimes people take the fragments I post (e.g. patterns or rules) and want to run them directly. But a Schematron schema has to start with the schema element.

So make sure the schemas start with the schema element. Make sure all files are well-formed. If you still get the problem, email the schema to me and I will test it here.

(I cannot debug XProc issues however.)

I used XSD to Schematron ant task and created the schematron file and validating the input xml file, now it works fine.

Thank you so much for the quick response.


Thanks,
Laxmikanth.S

Is the updated ant task for schematron available? I haven't been able to find it.
Thanks
Serena

Serena: Check out http://www.schematron.com/implementation.html for the latest Schematron for Ant task beta

There is a bug in the above examples.
They refer to "iso_expand_abstract.xsl" but the actual filename (in the xslt2 zip atleast) is iso_abstract_expand.xsl

David: Thanks for reporting that. Fixed now.

Is their any option to get the line number in xml file where error occur

Anon: I think it should deliver the XPath as part of the SVRL.

Line numbers are a little difficult to do: we did this previously in the Topologi editors etc but we had to make a customized version of XT to do so. We thought it was a great feature, but the market did not support it at that time.

IIRC correctly it involves extending the parser locators so that the line numebrs are sent, then extending the DOM so that line numbers are kept, then extending XSLT to make them available. Unless things have changed, it is not so easy out of the box.

It is a real shame, because it certainly makes Schematron more difficult to integrate with 80s generation CTAG system like off-the-shelf text editors.

My current plan is to port the old schematron-report system to the ANT task. It generates an HTML version of the input document with IDs, and the HTML validation reports also use these IDs: so the result is a hyperlinked version. Actually, now I think about it, a seperate utility that takes a set of simple XPaths (e.g. from an SVRL file) and reparses an input document with a simple specialty parser that tracks line numbers would be much more general purpose here. Hmmm...if anyone wants a night time project, that would be useful.

News Topics

Recommended for You

Got a Question?