Schematron 2009 released

By Rick Jelliffe
February 19, 2009 | Comments: 5

The latest and greatest release of my (our) open source ISO Schematron validator is out now, available at Schematron.com. Schematron is a validation language for making assertions about the presence or absence of patterns in XML documents; it is the most powerful of any standard schema language, and uses XPath.

There are two distributions. One is for XSLT1-hosted implementations, and another for XSLT2-hosted implementation (such as SAXON 9). The XSLT2-hosted implementations make it possible to use XPath2, which has much more grunt than XPath1. The distributions include versions of the popular iso_schematron_message and iso_svrl validators; in the next few weeks, the other common validators will also be ported and included. The two distributions makes it more straight forward to validate.

Both distributions are believed to be complete and correct ISO Schematron implementations. Both are based on a four-stage XSLT pipeline: an optional macro-processor to handle inclusion expansion (it also handles the inclusions required by several other schema languages), an optional second macro-processor to handle expansion of abstract patterns, the schema compiler, and finally the validator.

The XSLT2 distribution has several important experimental features, which may also make their way into the XSLT1 version. Some of them I am mooting for the upcoming update to ISO Schematron.

Localization
All error messages have been removed to an external file, allowing command-line selection of the language of error messages. Want a fun project for an hour or two and get to know Schematron a little better? Contribute a version of the error messages file in your native language. Of course, the validation messages that users get for Schematron are already localizable (you write them as part of the schema), and most programming errors are XPath errors, where we are at the mercy of the XSLT engine.
Properties
Traditionally Schematron has identified the locations of errors using XPaths; in fact, the current implementation allows selection of three different XPath formats. I have made some versions for Topologi that generated vi/emacs-style line/column metrics, and the schematron-report implementation generated IDREFss and marks-up the instance being validated as HTML with the corresponding IDS. SVRL has a location attribute for such Xpaths. However, this is a little clunky in the particular case where you want your output to have some structured data extracted from the instance document: at the moment Schematron handles dynamically extracting values just fine, but not chunks. So the properties mechanism is a way to improve this: a rule has properties (whenever it is fired) and assertions have properties (only available when the assertion fails.)
Multi-document patterns
Schematron currently validates a single document, however it can access using the document() function other XML documents to get information to help validate. The new implementation provides a pattern/@documents attribute which takes an XPath that produces a sequence of URLs, which are documents that the pattern applies to. This does not allow an arbitrary reach, but it does allow validation a fixed number of steps from a central hub document, such as the case where the instance being validated has extended XPaths, or where an OOXML archive has multiple worksheets.
Options
I have also been working on an update to Christopher Lauret's Schematron-in--Ant task. The initial version used schematron-message, then the update used schematron-svrl, and so was incompatible. So I am aiming to allow both. It turns out that making a nice Ant task involves bringing out some hardcoded options as XSLT parameters, so the new versions have more command-line options.

I will be looking at licensing issues again for Schematron. I will probably be putting out GPL distributions as well as the current Artistic license, and it is highly likely that I will put out Apache licensed distribution as well. Some people think licensing is really important, rather than a formality that only comes into operation when money is at stake: they think the world operates by rules rather than negotiation; but they seem to be in the ascendancy at the moment in many open source projects, so it seems reasonable to make sure that they can be satisfied.

This is the first version not be marked beta. You may wonder why code that mostly about eight years old is still marked beta? One reason is that schema languages implementations need to be suitable for providing facts about contractual compliance; the ratty state of implementation of W3C XSD validators in their early history has probably stuffed things up so much that no-one seriously would consider using validation for contracts now: Schematron is an attempt to make it easier to express almost any kind of constraint that real life documents throw up. I thought it was better to let reports of errors be a guide: I have not had a bug report in at least 4 months, and perhaps only one or two in the last year, so it seems pretty good.

So this version is marked Candidate Release as if it were a standard, with the intent that after a couple of months to check whether there are any newly-introduced issues, it would be a "final" release. It seems pretty solid.


You might also be interested in:

5 Comments

I'm trying to process schematron usign saxon.jar and get variable is already declared error when applying iso_dsdl_include.xsl (latest version of candiate release)

Please tell me how can I correct this.

Deepali: Thanks for reporting this. I will look at it tomorrow. (Also, the link to the XSLT2 distribution is missing a /tmp/ in the URI. I'll fix it from work tomorrow on 25-02-09)

Good to read that you work on the Ant task. Where is this available?

I have created an Eclipse plug-in for the 2007 beta version of the Ant task. Send me a mail if you want to get it. I didn't release it yet because I didn't want to worry about licensing. If you would like, I can create an Eclipse plugin for your updated Ant task as well.

Hey,

really like what Schematron does, and multi-document validation would be an invaluable asset for me. I might work around it by concatenating them, but that's just not pretty.

I've just starting using your ANT task (THANKS!) and I've discovered that it doesn't honour the failonerror attribute :(

Well done so far,

Gary

Gary: Thanks.

I will check the fail-on-error. (There are different features possible for the XSLT1 and the XSLT2 versions.)

The ANT task should handle validating filesets. This is where each document has the same kind of validation.

It has support the experimental sch:pattern/@documents feature for XLST2. This expects an XPath that returns a sequence/list of URIs which that pattern can then validate.

For example

...
This is where there is a document set, and you want to validate one document sometimes by patterns tested in an external documents.

News Topics

Recommended for You

Got a Question?