Fake non-realtime non-twitter non-video blog from XML Prague #1

By Rick Jelliffe
March 31, 2009 | Comments: 5

I wasn't there, but the XML Prague presentations are online now. Here are my thoughts from rummaging through some of them. There was a strong emphasis on XSLT and XPath-based systems: I think this reflects a technical opportunity that has been difficult for the big boys to take advantage of, since it does not fit into their product lines or marketing stories well.

First, this is the first conference where every presentation is online as video. Unfortunately, Australia is 38th in the world for average download times, and my company has a strict policy on such things and is about 1/3 the average European speed in any case , so I have not dared view any. But it seems a great idea if you are on that side of the video divide. So I'll limit myself for now to the presentations with text collateral.

Michael Kay has a presentation XML Schema moves forward. Michael has implemented large chunks of XML Schema in his SAXON XSLT2 processor, and has excellent access to the XML Schema WG as an editor of XSLT2 and XPath2 and member (Invited Expert) to the W3C XML Schema WG. XML Schema 1.1 is currently a 'Working Draft in Last Call' at W3C.

I was particularly taken with this from Michael's last slide:

The XML Schema WG needs help
-new members
-reviews and comments

My long-suffering readers will know why I am taken with this: active engagement with and recruitment from the larger community is a sign of a healthy standards group.

Michael has some interesting comments on the proposed assertions mechanism for XSD 1.1


•Restriction by grammar
-requires repeating the content model
-maintenance nightmare
•Restriction by assertion
-just say what‟s different
-can do "deep restriction"
test="empty( .//@currency[. ne „USD‟] )"

Maintenance nightmare? That is quite strong (and obviously not always the case) and Michael has quite a few brutally honest comments about XSD 1.1. I don't think the XML Schema WG has much alternative, now there is so much information around on the mismatches between what XML Schemas is good at (and there is no need to pretend that XML Schemas is bad at everything) and what people need.

I quite like the Conditional Types Assignment mechanism, in theory: if you have to have types then the more flexibly they can be selected the better. I would have preferred an alignment with RELAX NG however, to allow attributes in content models. That is neater; but I think they may solve slightly different problems: if CTA were also applicable to and in xs:group and xs:attributeGroup it would get closer to RELAX NG's power. XSD's distinction between complex types and groups is terribly incoherent, it seems to me.

But at least the CTA concept shows a glimmer that the XSD Schema WG is realizing that their earlier idea, which was that attributes are funny kinds of elements, is wrong for important classes of documents, where it is more like element names are a funny kind of attribute value. This conception is one of the fundamental differences between databases and documents as idioms, I would say.

Michael makes the interesting observation that the addition of assertions makes XSD compete with Schematron, but I would perhaps say that people who can switch over to XSD 1.1 assertions from Schematron probably were not using Schematron very well! I am not being dismissive here: in fact the ISO Schematron standard even has an annex on how to take subsets of Schematron and borrow them in other languages. But XSD 1.1 assertions do not allow assertion texts (Michael calls them "error messages" which utterly gets it wrong from the Schematron perspective), nor external documents, nor phases (progressive validation), nor dynamic diagnostics (which are "error messages"), and they have scoping restrictions. But way better than nothing.

I guess my problem with XSD 1.1 is that solves the problems that were pressing (from my POV) in 1999, while in 2009 we have a whole different set of schema-related problems. External codes lists, data broken between files, the rise of inline notations that are not usefully validated by simple regexes (yes, I said notations!), and many challenges with schema evolution and schema variants (evolution is not linear.) Schematron is still, I think, the only standard schema language with any reasonable story on these (except notations), though the various improvement to RELAX NG/NVDL on the cards will certainly move things forward. So while I am delighted that XSD looks like getting assertions 10 years too late, it is 10 years too late.

It looks like a stimulating talk I would have enjoyed. I use SAXON XSLT on almost every project, and most programmers I know use it by default. If you are using Java, it is certainly worth looking at seriously. (It has a .NET version too, which should be just as good.)

Tony Graham has a good general talk on Testing XSLT. He quite likes unit tests in moderation, but is not keen on metrics in general, I think. The end is the most interesting, where he emphasizes that human eyes are always needed for testing. His numbers are based on Microsoft's Steve McConnell's book, and I am entirely suspicious of the generality of the numbers: different companies and processes and programming culture and characteristic flaws are so specific that sectoral data may not be very reliable: NASAs study that showed that metrics which accurately reflected their past bugs turned out to be poor at finding or predicting new bugs suggests this.

The talk has a good list of current test tools.

Jeni Tennison follows this up with a specific look at XSpec, a unit testing system for XSLT that looks reasonable. A scenario (equivalent to a pattern in Schematron) has a context in the input document (equivalent to a rule context in Schematron) and then expect for patterns in the output document (equivalent to an assertion test to an external document in Schematron).

XSpec has some shorthand and grouping mechanisms (like and context/@mode). It would be much terser than Schematron for this kind of use, because rather then expressing the context and test as XPaths, it expresses them as exemplars. Schematron has never taken off for functional testing, even though it is perfectly capable of it, and I think XSpec shows that, at least for simple constraints suitable for exemplars, the XSpec kind of arrangement is less work.

Ken Holman gave a talk Introduction to Code-Lists in XML. I think this should be required reading for any professional involved in schema creation.

(But you know those tiny bibles engraved on a piece of rice?... Ken must have had a sore throat that day.)


You might also be interested in:

5 Comments

the title reflects this article.

shame that you couldn't see the videos (or read the twitter stream) from the conference ... that would have helped you capture some of the 'soul' of the conference; no offense meant, but couldn't XML.com find anyone who actually went to the conference to write up a review ? Your article seems to either contrast XML Prague talks with schematron or states what one could get by watching the video, reading the presentations or proceedings by themselves.

For example a good quote from Michael Kay, said during his talk:

"XML Schema is the most impenetrable specification I have seen since Algol 68." -- Michael Kay

no offense meant, appreciated that XML.com took the time to do a piece on XML Prague but I think it could have been better.

XML Prague is planning for March 2010 next year so Rick please consider coming this time around (on XML.com's dime though! we will take care of the registration fee).

cheers, Jim Fuller

Jim: Actually, XML.COM does not write my blog, I do. And I am interested in Schematron, as are my poor readers, so that is what I often focus on. (And XML.COM certainly don't pay me anything, let alone airfares; it only exists now as one channel for O'Reilly bloggers.)

I very often do reviews of conferences I don't attend. See Fake realtime blog from XML 2006, Fake realtime blog from SC34, Fake realtime blog from JTC1, Fake realtime blog from Document Integrity Initiative, fake realtime blog from XTech 2007, Fake non-live coverage of XML 2005, and even Fake fake realtime blog from Open Publish 2007, Sydney.

I am also interested in schematron and your own personal point of view, my mistake ... Its not very clear what the difference is in a general article or a blog entry in XML.com.

thx for taking the time to review XML Prague.

It was a terrific event, a lot like Extreme Markup Languages but on a slightly smaller scale. I'd strongly suggest any markup geek to attend next year. I will definitely return there if at all possible.

Best,

Ari Nordström

"I quite like the Conditional Types Assignment mechanism, in theory: if you have to have types then the more flexibly they can be selected the better."

that's true and i think same too. very good idea.

forum

News Topics

Recommended for You

Got a Question?