Can Schematron use grammars to test assertions?

By Rick Jelliffe
April 19, 2010 | Comments: 2

Every schema language does not need to handle every kind of constraint equally well. But in the case of Schematron, I sometimes read the comment that grammars are much easier for some kinds of constraints. And that grammars are more declarative and so help building forms and syntax-directed editors.

However, it is possible to do grammars in Schematron. I have circled around this several times before (for example when discussing converting XML Schemas to Schematron) because the trouble with grammars is that often don't give very satisfactory user messages. But after seeing a spate of papers claiming that Schematron was not as good as other schema languages because it could not represent grammars, I think it might be a good idea to put the code up, in case it is not so clear.

So here is a little schematron schema that implements the regular grammar that would have the DTD of


<!ELEMENT x ( a, b, c?) >

 <pattern>
   <rule context="x">
     <let name="grammar" value=" 'a b( c)*' " />
     <let name="contents"
        value="string-join(for $e in * return  local-name ( $e ), ' ') " />
     <assert test="matches( $contents, $grammar )"
     >The contents [<value-of select="$contents"/>] 
     should match grammar [<value-of select="$grammar"/>] </assert>
   </rule>
 </pattern> 

The variable grammar holds a string that is a regular expression. The variable contents is a string made from all the names of the child elements of the context, separated by a space. Validation is by simple string matching.

The regular expression can be as complicated as desired: indeed, the regular expression library in XSLT provide quite a lot more sophistication for various kinds of matches (especially with wildcards) compared to other grammar-based schema languages. Of course, these regular expressions are not tokenized like RELAX NG Compact or DTDs: the space is significant so a b( c)? is different from a b c?. (If it were desired to have a syntax more like RELAX NG Compact, you could define an XSLT2 function to rewrite that into this regex form. The assertion would merely become something like matches( $contents, my:rewrite( $grammar )).)

The other wrinkle to handle would be namespaces. Not knowing which prefix had been used for an element name makes this a little more complex. A few solutions suggest themselves: wildcarding the names like (\S+:)?a (\S+:)?b( (\S+:)?c)?, or rewriting the $contents and the grammar to use James Clark's notation.

This kind of grammar is just a regular expression, and unlike most XML grammars ( tree-regular grammars) a particular in the grammar is not a reference or declaration to any subgrammar for its contents. The only way an assertion can be tested is if it rule is fired, when the rule's context matches something in the document under consideration.

So validating elements with regular expressions in Schematron is almost trivially easy, but with the penalty being ungainly syntax for complex regular expressions rather than a lack of power. The contents of the $grammar field could be used to drive forms or GUI construction systems or for data-binding if required. And a complex regular expression may be an indication that a grammar is the wrong tool for the job: in which case the other capabilities of XPath are available.

[Update: Note that because W3C XML Schema has ancestor-based typing (see page 66) it is possible to use XPaths for both the type assignment (saying which regular expression an element's contents should conform to) and the validation (using the technique above) of an element.]


You might also be interested in:

2 Comments

Hi Rick,

The problem for a syntax-directed editor is that it cannot easily support any Schematron. It can support a subset of Schematron, the schemas that use a specific pattern, as in this case, the schemas with rules matching on an element and defining a grammar variable with a specific content but not any Schematron schema. If that will be the way many people write their Schematron schemas then it will worth the effort to support that subset but I believe it is hard to find a group of users using the same design pattern for their Schematron schemas.

Best Regards,
George
--
oXygen XML editor
http://www.oxygenxml.com

George: (1) In the abstract sense, since we can express grammars as shown, there is no intrinsic difference between what an XSD and a Schematron schema could do as far as supporting syntax-directed editors (for elements.) The blog is about validation rather than incremental editing.

(2) But you are quite right that in the practical sense Schematron does not have a standard metadata to say "This variable is actually a grammar" that could then be re-used for a syntax directed editor. In fact, <sch:let> does not allow @id or @role or any annotation attribute, so we cannot even hack anything together: the name is only thing there is. (I have already reported this an error to a national body, who I hope will bring it up at the next ballot.)

(3) But certainly you can make Schematron schemas that support syntax-directed editing: for example, a pattern that checks that the immediately following elements and first elements are what is expected. This gives the same kind of result as checking with a grammar for where the content model is not followed. See http://www.oreillynet.com/xml/blog/2007/11/converting_xml_schemas_to_sche_7.html and http://www.oreillynet.com/xml/blog/2008/01 /converting_xml_schemas_to_sche_9.html or http://www.oreillynet.com/xml/blog/2008/01/converting_xml_schemas_to_sche_10.html for the type of thing.

(4) But we don't have any conventions for telling an application which patterns are like that. Do we need it, if we have phases? Do you want to make a convention up? All variables called sch:grammar contain grammars? It is probably not too late to get that variable name reserved or defined in the new ISO schematron standard, as part of the XSLT2 binding.


(5) The new thing that may give new opportunities for better integration of Schematron and applications, I think, is the new Schematron <property> element. This is like <diagnostic> but for computer-related information. It can be used, for example, to put in user-specific repair code. Or non-XPath validation information.

See the draft http://www.itscj.ipsj.or.jp/sc34/open/1419.pdf at Annex L page 38 for some examples. (Review comments welcome, especially from George!) Properties are a very general feature, and (if the draft is accepted) they are the way I would hope to introduce CRDL, DTLL and other validation. (I have a CRDL validator in development as an extra stage in the pipeline, btw. It is a CRDL to Schematron converter. )

News Topics

Recommended for You

Got a Question?