Parameterized conditionals and versions within a single schema document

In praise of plain old macro-processing...

By Rick Jelliffe
January 20, 2010

Murata Makoto has an interesting XSD system up and running, as part of the SC34 WG4 Open XML maintenance work. The OOXML schemas are maintained in XSD for historical reasons, but also delivered in Relax NG, using a converter I started and Murata-sensei finished. But the issue being solved would apply equally to RELAX NG as much as XSD.

His problem is keeping track of the past and present evolution of the OOXML schemas. ISO OOXML 2008 has two schemas: a Transitional schema which is a kitchen sink, and a Strict one which is supposed to be a subset.

Now in theory you could model this using XSD type derivation, where you have the Transitional schema as a loose base schema, and the Strict schema as a schema derived by restriction. ECMA OOXML (ed 1) would be another schema derived by restriction, as would the notional real XSD that each version of Office actually implements. The reality is less good than the theory: in part because XSD 1.0 is not powerful enough since some of the changes go beyond XSD's capability to say things are the same type.

The wrinkle is that now the relationships of "Transitional" and "Strict" are changing in the fog of reality, for ISO OOXML:2010, and probably need some other name. (My impression is that this in part reflects a desire by some people—not necessarily the usual suspects!—that it would be better to have schemas more like "Now" and "Next" and that the 2008 OOXML schemas represented "TooMuch" and "Salami" if not baloney.)

SGML and XML DTDs allow parameterized schemas: you can set the value of one parameter entity and this includes or ignores various marked sections in the DTD. The effect is that you can have multiple versions of the schema, sometimes greatly diverging in one place, without requiring complex management tools.

Schematron has something similar: the sch:phase element enables or disables groups of patterns. More importantly, these are are parameterized so that the phase to be used for validation can be selected by the invoker without changing the schema.

RELAX NG has an element rng:notAllowed which can be used to block off paths in the grammar: it has a lot of potential.

XSD 1.1 provides a thing called conditional type assignment. (Sandy Goa has a good introduction here, situating it with Schematron, Services Markup Language and localization.) This allows a simple XPath to determine which type an element it. I don't think it allows parameterization, so the document has to have extra information in it. I'd reckon it to be a weak form of Schematron's sch:rule/@context rather than anything usable for parameterization though. There is an open-source XSLT script from Michael Sperberg-McQueen available: VC-Filter.

Murata-sensei's current system parameterizes the XSD schemas, so that there are structures like this (I don't know if the URL for this is public: anyway, it is here or interested people could ask Murata-sensei, I am sure):

  <xsd:simpleType name="ST_TextScale">

<strictOnly xmlns=
"http://www.itscj.ipsj.or.jp/sc34/wg4/schemaHacking">
<xsd:union memberTypes="ST_TextScalePercent" />
</strictOnly>
<transitionalOnly xmlns=
"http://www.itscj.ipsj.or.jp/sc34/wg4/schemaHacking">
<xsd:union memberTypes="ST_TextScalePercent ST_TextScaleDecimal" />
</transitionalOnly>
</xsd:simpleType>

<xsd:simpleType name="ST_TextScalePercent">
<xsd:restriction base="xsd:string">
<xsd:pattern value="0*(600|([0-5]?[0-9]?[0-9]))%" />
</xsd:restriction>
</xsd:simpleType>

<transitionalOnly xmlns=
"http://www.itscj.ipsj.or.jp/sc34/wg4/schemaHacking">
<xsd:simpleType name="ST_TextScaleDecimal">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="0" />
<xsd:maxInclusive value="600" />
</xsd:restriction>
</xsd:simpleType>
</transitionalOnly>

What we can see there are three separate requirements:

  • Conditional type assignment: it can change depending on the version
  • Parameterized type assignment: it changes based on out-of-band information, not data in the schema or instance
  • Declarations that are not needed are marked, so that it is possible to generate each clean version simply; useful for other utilities too

As far as I can tell, XSD conditionals only allow the first of these, and its conditionals are run-time not compile time.

Now, of course, what Murata-sensei has implemented is really just a simple macro-facility, no more complex in concept that the inclusion or abstract pattern pre-processors of my ISO Schematron implementation, or Michael's VC-Filter. But having the variant productions in the same file and syntax makes it much easier to do a variety of checks. And getting the various schemas into this form also allows the kinds of consistency checks promised by the type-derivation snake-oil salesmen.

XSD's design approach was to "reconstruct" the functionality of parameter entities in SGML/XML DTDs and give them all separate names. This was the premise of the early object-oriented languages, notable C++, which aimed at removing the need for a macro pre-processor (i.e. cpp's #IFDEF or m4.) It wants to turn schemas into components that are processed with an API rather than elements which are processed by XML systems (i.e. composite lexical objects.) The desire to utilize as little of XML's possibilities as possible may seem a rather contradictory approach for designers of an XML schema language, I imagine horrible weird cynics might say. Horrid people.

What XSD 1.1 shows is that even after 10 years, there still are uses of parameter entities that were not "reconstructed". What is one mechanism in DTDs (parameter entities) is in XSD declarations a dozen or more mechanisms for complex types, simple types, groups, attribute groups, conditionals, type derivation by multiple kinds, includes, redefines, imports, and so on. (I do not know that SGML's parameter entities was what William of Ockham meant when he spoke of entities! In fact, I suspect that Ockham's razor is a balance to the XSD agenda of "reconstruction".)

And even after all this, XSD still don't have parameterization: parameter entities, geddit? A mechanism to use parameters of the URL in the schema would be a good webby start.

I think it would be quite useful for ISO DSDL or its parts to get a generic parameterized inclusion mechanism. Even if it is just something that we can supercede with smarter elements later down the track. (I suspect that language designers are better off only geting rid of a macro stage when they have exhausted macro's possibilities, not ahead of time.) We should look at whether we can re-use XSD's conditional mechanism I suppose, or whether OOXML's MCE could be upgraded with parameter information and used in schemas (it is based on switching between namespaces, which is not what we want.)


You might also be interested in:

News Topics

Recommended for You

Got a Question?