Fake real-time blog from Basilage 2009 #1

2 papers on Schematron

By Rick Jelliffe
August 12, 2009 | Comments: 11

Some of the presentations are up at the preliminary proceedings of the Basilage 2009 Conference, and for its side meeting International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. It looks like it was another really stimulating conference, and I envy people for being close to all that mental action.

Looking over the papers, a few stand out:

Josh Lubell from the US NIST is a guy who really seems to see what I see in Schematron, or at least, he has to deal with a set of problems for which Schematron is pretty good. I heartily recommend his paper Documenting and Implementing Guidelines with Schematron which is probably the best presentation on Schematron I have read. Lubell positions Schematron more in the literate programming tradition and values its cross-cutting features.

He points out that there is a class of guidelines which are really important in modern schemas, and that there are both guidelines that apply to documents (business rules) and guidelines that apply to schemas (Naming and Design Rules): Schematron can be used for both.

The next paper of interest is Jacques Durand Test Assertions on steroids for XML artifacts and frankly it made me pretty angry, in parts, though I thought it was a good paper overall. The paper is about the OASIS Test Assertion Guidelines effort (TAG), which is very interesting and powerful.

But he has a little section contrasting TAG with Schematron, and he says some things there that are just complete rubbish: for example:

variables are missing in Schematron.

Then it turns out he using Schematron 1.5. But variables were added in Schematron 1.6 in 2002, and are certainly part of ISO Schematron. Only seven years out of date. I guess the problem really is that there are developers of Schematron systems who never bothered to upgrade. But giving a paper to say that you discovered that something is missing a feature it in fact has...it is not the path to credibility.

That aside, Durand makes a lot of good points. Schematron indeed does not have a system for chaining rules: in the ISO DSDL schema of things, we have been waiting for the arrival and evaluation of W3C XProc to provide chaining of tests: of course, Schematron's phase mechanism and parameterization allows the same schema to be used for a succession of patterns. And, furthermore, since the SVRL output of a Schematron schema can itself be tested with a Schematron schema (using, for example, the flags and roles atributes), the issue becomes more one of architecture. Indeed, it is possible to take the view that the trick is trying to arrange schemas that don't require higher order logic, to be practicable.

It was interesting to read of the TAG test model, and its requirements for a Test Assertion. I would argue that actually a Schematron assertion can meet the criteria:
it can have an ID, it has a source (the assertion text), it has a target (the @context, well the @subject strictly), it has a prerequisite (the exclusion of all previous content paths within the same pattern), it has a predicate (the assert/@test test), and it can have a prescription level (using the @role on the assert or on an associated diagnostic.)

It would certainly be possible (but inefficient) to make a Schematron validator that returned the Reporting structures with the notQualified result available too. Indeed, it may be a good thing to double check, since I am just working on the text for the next version of ISO Schematron standard.

The thing that is frustrating, is that it would be great if potential users of standards such as Schematron would bother to give feedback and figure out how to enhance each others' work rather than rushing off into NIH. Ho hum. Nevertheless, I think OASIS CAM and OASIS TAG are two really good initiatives which Schematron and DSDL can learn a lot from from, and I certainly wish them well, regardless of any minor grumpiness on this or that.

More later


You might also be interested in:

11 Comments

Very valuable to have your take on the case for an XPath profile for Test Assertion Markup Language. There has been some debate going on about the extent to which it might merely duplicate what can be done with Schematron. Interesting that others too have been picking up on chaining of test assertions using prerequisites as being a key feature which goes beyond Schematron in isolated use (shell, etc or XProc being needed in addition to Schematron for such chaining). It is clear that being able to exclude a sequence of reports from the results when there is no reason to even perform a particular test (due to the failing of a prerequisite meaning that a target does not even qualify for it), that this simplifies reports. Another factor that occurs to me is that the Test Assertion Markup Language is designed with all kinds of spec requirements in mind, not just ones involving XML: The profiling of an XPath binding is in addition to the base language and its usage. This means that test assertions for any non-XML aspects of a profile can be mixed with XML-related assertions all in the same markup. The TAG TC efforts are aimed at use cases like Java specs as well as XML documents and SOA messaging like ebXML and Basic Profile. This adds hope that these various domains can be covered by a joined up approach and Schematron fits nicely into this too. Hopefully the TAG TC model does add weight to the Schematron model for assertions and the excellent succinctness it embodies.

Stephen: Schematron is not a language for expressing rules. It is a language for expressing patterns. I think this is a central distinction that people miss: fans of type systems look for types and find Schematron deficient, while fans of rule systems look for rules and find Schematron deficient.

It does not surprise or particularly concern me: Schematron does not have to have be the best type system or the best rules system but it does have to be the best system for asserting and reporting the presence or absence of patterns in documents. The pattern is the first-class construct in Schematron, and rules just mechanism with little analytical significance (a rule does not equate to a type, for example.) And as the patterns used in documents develop (such as the idiom of XML-in-ZIP documents, and the idiom of Relationships in OOXML) so Schematron needs to keep abreast of them.

A pattern is a generalized collection of constraints that corresponds to some useful analytical grouping. A table. Datatypes. Structure. Date reasonablness. Required elements. A head/body pattern. A partial order. An RDF island. Mixed content. key/ketref. ID uniqueness. A link from a certain element in one document to a certain element in a different document. The required use of glossary or external vocabulary data items. A dictionary entry in pre-1901 conventions. A dictionary entry in post-1901 conventions. A DNA fragment. All these things are patterns.

The new day's idiomatic patterns are much more difficult to see and figure out than yesterday's glib generalities: what I am interested is evidence about why pattern specification in the Schematron sense would be enhanced by rule-chaining (in particular, being able to use the results of one rule as data in another rule context or assertion, which is the real functional distinction, since other kinds of typical rule capabilities can largely be done by pipelines of multiple Schematrons.)

Thanks Rick for your interest in TAG work - and a well-deserved slap on my hand for getting it wrong with SChematron variables. A little more homework on my side would have cleared-up my confusion as which version ISO-schematron was based on - 1.6, not 1.5. A major objective behind TAG work is to change the way people think of conformance testing (for software apps, middleware, etc.) Clearly stated test assertions - as processable entities - allow for separating the analysis phase of the testing (conformance report) from the prior operation phase (running test scenarios). Doing so allows for the analysis phase to rely entirely on XML technology, and be done in a standard way regardless of the system under test and its platform / environment. So it is not surprising that there is *some* functional overlap with Schematron... That could be worth exploring down the road. Got your point on "patterns" and "rules" although all these concepts are not quite clear-cut from a definition viewpoint and could mean many different things to different people.

Jacques: There certainly is a lot of overlap between schema/validation languages and testing, though of course they take different angles: a schema is intended to be definitional while a test is intended to be explorational, for one.

There may be fuzziness on rules and patterns, but they are still fairly disjoint in my usage at least: a pattern is an abstraction of a partial document structure (or other constraints) while a rule is a mechanism.

It is easy to see Schematron's patterns as being just a different name for a case statement (and schemas can indeed be written without thought for using the pattern element to express patterns); but statements about the mechanism do not negate statements about the intent of the element!

Patterns (with phases, and the primacy of the assertion text, and the split between assertion and diagnostics) are the distinctive features of Schematron. Many other languages that take the same XPath route do not bother with any explicit grouping mechanism like patterns (or phases, etc); but the grouping is not just for convenience, it is to allow topical coupling.

Thanks for the compliments! I think people tend to emphasize Schematron's enforcement of rules at the expense of its strength as a representation language in its own right. Your blog post about expressing untested and untestable constraints opened my eyes to the possibilities for Schematron outside the rule engine realm. I had an "aha" moment where I realized that Schematron is useful even if constraints are enforced by other means. I also saw that the Schematron language is a reasonable language for representing guidelines documentation.

Is it fair then to say that Schematron is not just a pattern matching schema language but also a test assertion markup language? If test assertions which cannot be executed with XSLT can nonetheless be defined using the role="UnImplemented" attribute then maybe test assertions for spec statements like 'the XML invoice file MUST NOT be larger than 500Mb' can be written as a test assertion using Schematron. Is this an anticipated and supported use case for Schematron? I guess the in-progress Test Assertion Guidelines and Test Assertion Markup Language we are progressing in OASIS TAG TC is better optimised for test assertions in general, or is it? Can all of the OASIS TAG TC test assertion model (apart from the prerequisites and chaining) be represented using Schematron? I guess the 'prescription level' (mandatory, preferred, permitted) might be missing too (to cover 'MUST', 'SHOULD', 'MAY')

Regarding the benefit of rule chaining - using the results of one rule in the context, etc of another - the key factor with regard to test assertions is that if one rule is found to have been broken there are going to be other rules which it would not make sense to even test. So being able to include a rule outcome in, say, a prerequisite to running the testing of another rule means that we do not test the second rule if its prerequisite is not satisfied. If the second rule is itself a prerequisite to others then there will be a whole sequence of tests which need not and maybe SHOULD NOT be attempted (or the results might be misleading or at least hard to analyse). I can imagine other benefits too of course but this one seems to me to be the most crucial.

Stephen: In Schematron, there is a mechanism called phases which group patterns. A set of documents can be validated in one phase, then the shell (XProc or whatever) can decide based on the result whether to run validation with another phase.

So Schematron has organized and named groupings to allow that kind of chaining, but it defers all interpretation aspects to the caller.

I also note that in practise what people do is merely filter out results they are not interested in from the SVRL output (a standard XML format for the results), based on IDs or roles.

So this is not rule-chaining in the sense of executing tests (lazy rule chaining?), but the output could be identical.

That's very interesting. I had assumed it would be possible to use Schematron like this and have tried similar things in the past using shell scripts combined with Schematron. It would seem to be good practice to first define test assertions using something like the OASIS TAG TC test assertion model (and/or maybe Test Assertion Markup Language to mark it up for consistency across conformance profiles and to foster tool support, etc) then to hand it to a developer to write the Schematron and shell just like handing TAs to a test suite developer/engineer. Is there an example anywhere of using XProc in this way with Schematron? Then the developer need only be skilled in the XML technologies. Only! :-) Not that there aren't a lot of shell script writers out there, many of whom could readily pick up the Schematron skills. However, no problem seeing, I think, how having all this be achievable in just one kind of markup is desirable so there seems to be ample room for both Schematron (+ Shell/XProc) and the Test Assertion Markup Language, XPath profile (TAML + XPath, alias 'Test assertions on steroids', 'Taos').

- About rule chaining: there are indeed diverse ways to look at rule chaining (which we should call "test chaining" to avoid semantic overload). I can see the need for a "coarse-grained" test (or pattern) chaining such as a combination XProc + Schematron would offer. And this may be all what many document users need - I defer to document experts to decide. The X-Taos implementation of TAG is aiming at a becoming a component in broader test procedures that target "processors" of various kinds, of which X-Taos ambition is to cover the "analysis" part, consuming the (XML) output of a previous phase that consists of driving or monitoring the processor under test. In this context, fine-grain test chaining becomes valuable for reducing the cost of debugging and driving/monitoring processors under test (phase 1 is costly), as well as for simplifying otherwise complex assertions. Ordering and conditioning individual tests by the outcome of other individual test(s), helps do this, and leads to a rule chaining style of test execution. So again, the application scope is the determining factor here for how best to implement a "test chaining" feature.

- About Schematron implementing TAG design: it would be great inded to see something like a "Schematron best practice" for applying TAG methodology. Rick J. started to sketch something like this at the beginning of this blog. The schematron rule design is close enough to "test assertions" according to TAG, but the tricky part is to ensure an intuitive execution: when designing TAs for a particular target type, one expects ALL TAs related to this target type to be exercised for each target instance (whether fail or pass). To get this behavior in Schematron, one would probably need to group all TAs for a particular target match, inside the same schematron rule, i.e. a TA (according to TAG) would actually map to a schematron assertion inside a rule, not to an individual rule. More precisely, to a pair (assertion / report). So this kind of "best practice" would need be spelled out to get the expected execution behavior for such TAs.

News Topics

Recommended for You

Got a Question?