Using Schematron to declare and report implementation limitations

By Rick Jelliffe
November 24, 2008 | Comments: 3

We conventionally think of schemas as being used as an exemplar pattern or contract between sender and receiver: standard schemas being perhaps the ultimate in this.

However there are also various kinds of limiting schemas: profile schemas where an industry or job group decide to subset the standard in some way, usage schemas where the schema is derived from the document set and lets you check whether a new document may have markup that has not been processed before, and an implementation schema, where the schema can be used to test documents that they only contain structures or values that can be accepted by a particular implementation of the standard schema.

What kind of limitations are we talking about? There are necessary limitations, such as the precision used for a calculation, which typically will not be specified in the standard. Then there are conforming limitations, which is where the standard specifies a typical behaviour or range of possible behaviours, which may include not accepting or stripping or ignoring some given element. Then there are non-conforming limitations, where an implementation does not accept some or all of some allowed value, or does the wrong thing with it.  There are extension limitations, where the standard allows,some extensions but the implementation does no. And finally there are partial implementation limitations: for example where some element is accepted or generated or even round-tripped, but is not actually used: graceful degradation behaviour may fall into this category,  

Schematron is well-suited to expressing these kinds of limitations.

The simple method for doing this is Schematron report element. For example,

<rule context="picture">

     <assert test="number(@width)" see="http://www.egstd.com/songXML.pdf;page=56">     A picture should have the width specified (in attribute width)
as a number.</assert>
<report test="@width &gt; 100" role="limitation">
The implementation does not support widths greater than 100.
</report>
</rule>

In this case, the assert element states the general constraint, but the report element (labelled with the role of "limitation" so that the validation message can be logged, routed or formatted appropriately by the user agent) says if the implementation limitation has been exceded.

Schematron has an attribute see which allows linking of constraints back to the original document.

Another approach would be to gather all the limitations for a simple product into a pattern. (Schematron's phase mechanism would be used to select the right pattern to test against, or to test the document against the limitations of multiple products at the same time by selecting multiple patterns.)

<pattern id="BoaB1">

     <title>Limitations of the product Brid on a Brae v1.0</title>

     <rule context="/">

           <report test="true()" role="limitation" diagnostics="terminal"

flag="unrecoverableError" >

            This file format is not supported on this old software.</assert>

      </rule>

</pattern>

<pattern id="BoaB3.1" >

     <title>Limitations of the product Bird on a Wire v3.1</title>

     <rule context="song">

           <report test="count(verse) &gt; 10"  role="limitation" diagnostics="continue truncate">

           The Brid on a Wire 3.1 software does not support more than 10 verses.

           </report>

    </rule>

</patttern>

<diagnostics>

    <diagnostic id="terminal">Unable to continue</diagnostic>

    <diagnostic id="continue">Processing will continue as best as possible</diagnostic>

    <diagnostic id="truncate">The material that cannot be processed will be removed</diagnostic>

</diagnostics>

You can see in that example there are two versions of the software: one pattern for an obsolete version just says "forget it", and one pattern for some other version which has a limitation in the number of elements it can handle. as before, the role attribute is used to label that the reports are "limitations" (you can use any keyword you like, to suit yourself: Schematron does not define particular roles).

The diagnostics attributes are used to select various subsidiary messages which concern practical issues: again the intent is to provide enough information declaratively that a back-end validation system can use. In this case I have also added a flag attribute: this sets a single label over the whole validation (i.e. in this case "unrecoverableError" is a label that is either present or absent as a validation outcome): this allows more specific control of the validation session, for example.

And the schema has the usual Schematron advantages: flat, declarative, powerful, open, human-friendly, and it fits into modern reporting systems (such as fatal/caution/warning/note systems).


You might also be interested in:

3 Comments

How to you see Schematron evolving with XML Schema 1.1 assertions in the future.

John: I was happy to see the XSD 1.1 assertions added, and some of the other mechanisms that have been mooted in the area of alternatives are good too.

But they are classic XSD: the relentless search for the absolute minimum bang-per-buck. IIRC the drafts just use the small streaming subset: big deal.

Gloves off, I think XSD (and pretty much every other schema language) askes fundamentally the wrong question. Or, at least, they ask a question that is useful for DBMS vendors (i.e. a declarative langugage from which DBMS storage types can be derived, and which is couched in a way that OO programmer will find self-affirming) but presents insuperable difficulties for people with different needs: this is made worse because in claiming to be a universal schema language, it dismisses (and therefore insults) and marginalizes the use cases it is poor at supporting. It goes where the money is, no where the needs are.

Consequently, there are many of the most fundamental XML languages which don't have schemas: SVG, for example.

So what are the right quesions for a schema language? First, how to capture constraints in ways that are both declarative and which allow capture and validation to be explained in terms of the user's experience and use case, not in terms of element and attributes that may be hidden. The assertions in XSD 1.1 draft do not have any concept of humans.

Second, that documents progress in many axes: through workflows and pipelines, over cut and paste, in and out of different implementations with different capabilities, in and out of companies with different business rules, in and out of independently created back-ends made with independent schemas, with potential values (eg codelists) that change independently of the basic constraints, split into subdocuments and linked information, and maintained through different versions of namespace and schemas, and so on.

There are literally scores of these problems, and cumulatively they make XSD into almost a niche technology: in particuar because the type derivation mechanism (for complex types) is laughably impractical (not even just a low bang-per-buck, but an actual disabler of usefulness.) I think XSD builds in many issues concerning type-checking and derivation that would be better as a different layer. RELAX NG gets it more right: I see the new ODFDOM was created by convering the RELAX NG schema to Java classes: so the basic schema language does not have to have a derivation mechanism to allow this kind of data binding.

Now, amidst all this, I don't want to suggest that XSD is not actually excellent for the things it is good at. When it is suitable it is very suitable, and it has allowed good innovation in a couple of imporant areas that are of no interest to me and my work. (I don't go as far as some to say, for example, that it is absolutely toxic, because it encourages early binding and therefore fragility, son-of-CORBA style, though that may be the effect sometimes.)

And I think the XSD WG is doing a sterling effort with the XSD 1.1 draft: this pig needs that lipstick! Unless Microsoft and other vendors (who have a bad track record in tracking changes to standards) XSD 1.1 may be unfortunately irrelevant (or, at least, a way of locking systems in to non-MS tool chains by using a schema language dialect MS does not support.)

But the XSD 1.1 draft assertions simply don't address the kinds of problems, in breadth or depth, that Schematron does. I don't want to claim of course that Schematron is better than XSD for the kinds of jobs that XSD is good at: that is not the purpose.

Instead, I would say that forming our ideas of what schema languages should do by merely looking at what XSD does is to sell ourselve short. I think XSD is fundamentally non-disruptive and even reactionary in terms of its effect, while Schematron allows the more of the radical uses of XML.

XSD was developed with imperitives that have been largely superceded by XML's success: to be grammar to replace DTDs, to have derivation to fit into objecty mindsets, to have storage types to fit into DBMS, to have something that large vendors could build large systems for large customers with. That fruit did not fall far from the tree. There world is bigger than those imperitives.

I think the main advantage of schematron is human friendly custom diagnostics messages. So XSD and schematron will co-exist peacefully in my applications.

In the schematron.com web site, i read: "The next release will not be marked beta!". When is this scheduled for? and thanks for all the good work.

News Topics

Recommended for You

Got a Question?