Validation when technocrats aren't in control

What is missing from the other schema languages that would really bring them into the same orbit as Schematron?

By Rick Jelliffe
November 29, 2009

Some computer processes have technocrats in control: we tell the user what datatypes and structures they need, and everyone else basks in our munificence. All is well with the world, everything is in order, Santa Claus is coming to town.

However, sometimes things go wrong, and systems are made without the technocrats. And these are frightfully messy, suboptimal, mad, bad, dangerous, wooly, amateurish, and don't train users in adequate deference to our technologies. And the other stakeholders seem strangely oblivious that they have created difficulty for us in implementing what makes life easier for them. Carts are before horses, babies are with bathwaters, lunatics have taken over the asylum, and other colorful phrases.

What is Rick on about now? This week I have been working on a Schematron validator for addresses for an organization. Look at the requirements, and consider how you would handle this with XSD:

  1. The street name is in an element Street. The street type is in an attribute Street/@Type.
  2. The type is one or two tokens, not case sensitive.
  3. Any of the street types could have an optional second token, E, W, N, S, NE, NW, SE, SW.
  4. The types are updated regularly, at different times to the main schema and by different people.
  5. The street type is supposed to be one of the fifty or so abbreviations allowed by Australia Post, e.g. RD.
  6. The data may have the full value, eg ROAD, instead. The validator should issue a warning but it is not an error, and it should generate the appropriate short value.
  7. If the abbreviation was used, the validator output should include the mapping to the full value, but it is not an error or a warning.
  8. Some streets, such as Kingsway, have no street type. Otherwise, it is an error if no type is used, or if the type is not one of the abbreviatied or full names.
  9. There are a number of exceptions, such a Canal, which are not allowed by the Post Office but needed. The validator should generate a warning about this, but it is not an error.
The further you go down the list, the more you will be struggling if you are using XSD 1.0. The separation of concerns, where the code list needs to be maintained externally, is an issue that has come up before, but not what I want to write about here.

When 'validity' isn't a boolean function

XSD 1.1 would obviously be much better at supporting this list of requirements, except for all the requirements where validation requires a different answer than "valid" or "invalid": where you want a warning, caution or note rather than an error flag. Our technocratic solution is that you have one system which can generate errors messages and then some parallel system, home-made and non-standard, which must generate warnings, cautions and notes.

XSD 1.1 has tacked on simple XPaths, and even called them assertions. And they have allowed a kind of type selection too. I have criticized technical details of these elsewhere. But my more fundamental criticism is that by ignoring the twin issues of the centrality of (communicating to) humans and the false idea that validity is a binary consideration rather than gradated, all we get is a very complex, very powerful solution to a niche problem. In fact, XSD becomes the authority on what a Schema language should support, circular thinking which marginalizes people with requirements that cannot be shoehorned into XSD.

Look at the kinds of messages a compiler gives: all sorts of warnings, classified by severity. The reason XSD doesn't is not because of any theoretical requirement, it is just that DTDs didn't and it isn't the way people think about grammars.

And RELAX NG, DSRL, CDRL and the other DSDL languages have the same kind of problems too. I often contrast grammars (boo!) with path-based approaches (hurrah!) but paths and grammars are ultimately not completely different beasts in theory.

What is missing?

What is missing from RELAX NG, and the others, as with XSD, are two conventions:

  • How do you attach some text to a particle or group of a content model or pattern which has the unambiguous semantic that it is intended for users to explain the content model in natural language? <documentation> clearly does not cut it. The equivalent of Schematron assertion texts and diagnostics?

  • How do you attach some property or attribute to a group or particle or pattern which has the unambiguous semantic that it gives information on what the role of that thing is: is it obsolete? is it deprecated? should its absence merely generate a warning? and so on. Schematron has an attribute role for this kind of thing, which gets put out in the SVRL and the XSLT API.

In fact, because of UPA, it might even be marginally easier to fit this kind of information on top of XSD than on RELAX NG!

This week, SC34 WG1 is meeting in Paris. Paris: poor things! I believe on their agenda is discussions about new features for RELAX NG. The thing I'd really think is useful would be for all of DSDL to align itself with Schematron in these human-centred, traceability-centred capabilities.

You might also be interested in:

News Topics

Recommended for You

Got a Question?