The latest and greatest release of my (our) open source ISO Schematron validator is out now, available at Schematron.com. Schematron is a validation language for making assertions about the presence or absence of patterns in XML documents; it is the most powerful of any standard schema language, and uses XPath.
There are two distributions. One is for XSLT1-hosted implementations, and another for XSLT2-hosted implementation (such as SAXON 9). The XSLT2-hosted implementations make it possible to use XPath2, which has much more grunt than XPath1. The distributions include versions of the popular iso_schematron_message and iso_svrl validators; in the next few weeks, the other common validators will also be ported and included. The two distributions makes it more straight forward to validate.
Both distributions are believed to be complete and correct ISO Schematron implementations. Both are based on a four-stage XSLT pipeline: an optional macro-processor to handle inclusion expansion (it also handles the inclusions required by several other schema languages), an optional second macro-processor to handle expansion of abstract patterns, the schema compiler, and finally the validator.
The XSLT2 distribution has several important experimental features, which may also make their way into the XSLT1 version. Some of them I am mooting for the upcoming update to ISO Schematron.
- All error messages have been removed to an external file, allowing command-line selection of the language of error messages. Want a fun project for an hour or two and get to know Schematron a little better? Contribute a version of the error messages file in your native language. Of course, the validation messages that users get for Schematron are already localizable (you write them as part of the schema), and most programming errors are XPath errors, where we are at the mercy of the XSLT engine.
- Traditionally Schematron has identified the locations of errors using XPaths; in fact, the current implementation allows selection of three different XPath formats. I have made some versions for Topologi that generated vi/emacs-style line/column metrics, and the schematron-report implementation generated IDREFss and marks-up the instance being validated as HTML with the corresponding IDS. SVRL has a location attribute for such Xpaths. However, this is a little clunky in the particular case where you want your output to have some structured data extracted from the instance document: at the moment Schematron handles dynamically extracting values just fine, but not chunks. So the properties mechanism is a way to improve this: a rule has properties (whenever it is fired) and assertions have properties (only available when the assertion fails.)
- Multi-document patterns
- Schematron currently validates a single document, however it can access using the
document()function other XML documents to get information to help validate. The new implementation provides a
pattern/@documentsattribute which takes an XPath that produces a sequence of URLs, which are documents that the pattern applies to. This does not allow an arbitrary reach, but it does allow validation a fixed number of steps from a central hub document, such as the case where the instance being validated has extended XPaths, or where an OOXML archive has multiple worksheets.
- I have also been working on an update to Christopher Lauret's Schematron-in--Ant task. The initial version used schematron-message, then the update used schematron-svrl, and so was incompatible. So I am aiming to allow both. It turns out that making a nice Ant task involves bringing out some hardcoded options as XSLT parameters, so the new versions have more command-line options.
I will be looking at licensing issues again for Schematron. I will probably be putting out GPL distributions as well as the current Artistic license, and it is highly likely that I will put out Apache licensed distribution as well. Some people think licensing is really important, rather than a formality that only comes into operation when money is at stake: they think the world operates by rules rather than negotiation; but they seem to be in the ascendancy at the moment in many open source projects, so it seems reasonable to make sure that they can be satisfied.
This is the first version not be marked beta. You may wonder why code that mostly about eight years old is still marked beta? One reason is that schema languages implementations need to be suitable for providing facts about contractual compliance; the ratty state of implementation of W3C XSD validators in their early history has probably stuffed things up so much that no-one seriously would consider using validation for contracts now: Schematron is an attempt to make it easier to express almost any kind of constraint that real life documents throw up. I thought it was better to let reports of errors be a guide: I have not had a bug report in at least 4 months, and perhaps only one or two in the last year, so it seems pretty good.
So this version is marked Candidate Release as if it were a standard, with the intent that after a couple of months to check whether there are any newly-introduced issues, it would be a "final" release. It seems pretty solid.