Public draft of next generation of ISO Schematron available for comment

ISO/IEC CD 19757

By Rick Jelliffe
April 14, 2010 | Comments: 3

The Committee Draft (CD) of the new version of ISO Schematron is now available at the ISO/IEC JTC1 SC34 SC34 Website (PDF). In the JTC1 workflow, this is the version that National Bodies comment on over the next 3 months.

You can send comments in too, for example to me (as editor) or NB participants of the WG1 and SC34 (such as Alex Brown, Francis Cave, Jiří Kosek, Murata Makoto and Mohamed Zergaoui) and the issue can be raised in NB comments: I already have included features and changes based on comments here and from the Schematron mail list.

After the ballot, we resolve any comments and put out a Final Draft International Standard (FDIS) (which is not open, so to reduce drag from implementations of drafts that might work against late changes, and to reduce the chance of implementations claiming to implement a standard but actually implementing a draft.) Then this FDIS gets voted on again. So it now is looking like 1Q2011 before the new version gets on the books at earliest: five years between versions seems healthy.

All the features of the CD are already in the popular skeleton implementation at www.schematron.com (which I maintain) which has superceded my old Schematron 1.5 implementation at Academia Sinica. Please use ISO Schematron for new projects.

I have discussed many of the new features of the draft before, but I'll just summarize some again:


  • Modularity: The 2006 standard had an include element however it was not very useful: it just stuck the external XML file and fragment inline. The 2010 CD enhances the extends element, which substitutes the contents of the located file and fragment; this simple macro mechanism allows containers without the complication of SD-style components.

  • Properties: The 2006 standard was most interested in communications with humans, and successful at that. It did provide several facilities for annotating this information, such as the role, see and flag attributes. However, I began to see many users wanting incorporate structured information for automated processing in the assertion text, violating the purpose of it. For example, people would make an assertion X1233:c.34:2009:The widget should have a silly name, and then have some home-make parsing mechanism to extract the messages. This was particularly true when the structured information had some generated information. So the new properties element has been created to allow a more powerful approach with better representation of the separate concerns. The CD has a new Annex L, which gives many examples of what properties can be used for. Properties also provide a way of adding non-Schematron constraints, such as CRDL character repertoire typing.

  • XSLT2 and EXSLT: Schematron is not limited to using XSLT1, but the schemas can select which query language binding they use. The 2006 standard reserved several names for this purpose. The two most popular are XSLT2 and EXST, so the CD defines bindings for these. (There is also an informative example binding for STX, the streaming transformations, to encourage implementation.)

  • Structure Variables: The 2006 standard only allowed an XPath expression as the value of a let element. However, this meant that when you wanted to have a lookup table for information, for better modularity, you had to use an reference to an external document. So the CD follows XSLT and also allows the value of a variable to be given in its content, which could include any arbitrary element content.

  • Support for Document Collections: The advent of the XML-in-ZIP formats has brought to a head the trend in XML away from single large documents towards smaller linked documents. But this changed has reduced the utility of validation: patterns may be distributed between documents! The 2006 standard already allowed the document() function, which allowed an assertion or variable to access information in an external document, however the rule contexts were always the document being validated. In the CD, the pattern element may have a documents attribute that can have an XPath expression that evaluates to a list of URLs. The pattern is tested on each of these documents in turn. (I am really happy with this approach, because it strengthens the idea of a pattern as something that adheres to documents rather than merely being some kind of odd type mechanism that adheres to information items.) I suppose it might even allow patterns limited to particular branches, too, but I have not pushed this as a justification, but it might have some efficiency and phase benefits.

On this last feature, there is a suggestion that the mooted ISO ZIP standard should provide a simple solution to the multiple-document XML-in-ZIP validation problem: a kind of reverse structure to NVDL which creates a temporary synthetic XML document containing the ZIP directory structure and any XML files put inline: this would allow a single conventional schema to validate the whole XML-in-ZIP document even with grammars, and it would interact with NVDL well. However, the sch:pattern/@documents attribute would be useful in any case.

So please feel free to make comments on the draft, even here in the comments section. Especially if you have come to any brick walls where something that you were expecting to be simple was in fact not supported very well by the standard.

The new draft Schematron standard is now a massive 40 pages. My expectation is that the standard will continue to be available free from ISO. Pretty good bang per buck.

(Special thanks to WG1 convenor Alex Brown for the typesetting.)


You might also be interested in:

3 Comments

When will the final draft of the International Standard be ready? I've seen that many people have added comments after viewing the PDF.
Lilia Gephardt @ domain search

I see many parts which "applications are not required to make use of this element/attribute". But then these element/attributes are futile. It would be wise to better collect the mandatory parts in "Schematron Core" and the rest in "Schematron Full", so you can say that an application supports one of both.

If writers of Schematron implementations request it, we could certainly do that.

But you generally make a profile (named subset) when there is some reason to do so for conformance reasons: eg because you want to deprecate bits that are outside the profile or because you want some collection that all vendors have implemented (or which users can rely on being there.) Do any of these kinds of cases apply to Schematron?

At the moment in ISO Schematron, the conformance profiles are really determined by the Query Language Binding: sch:schema/@queryBinding. These allow us to remove language features that are not (readily) supported using a particular query language, already.

News Topics

Recommended for You

Got a Question?