I am looking forward to seeing the report from the ODF Glugfest 2009. The Dutch government is doing everyone a great service in organizing this.
Actually, I am more looking forward to seeing the result of next year's plugfest, when we should actually see whether competition is increasing implementation quality. We should get some good bug reports this year, good fodder for marketing insinuendo I know, but necessary for the market/bazaar to operate.
Looking over the site, two things stuck out. The first was their working definition of interoperability for the purposes of the workshop being:
- Content equivalence - no text or data is lost
- Structural equivalence - headers, footers, tables, are preserved as headers, footers, etc.
- Dynamic equivalence - style names are preserved, live field names remain live
- Presentation equivalence - page size, margins, font sizes and styles, etc., preserved
This is quite similar to my Classes of fidelity for documents (raw, exchange, industrial, facsimile) except for the final class: my "facsimile" class would have word, line and page breaks preserved, while the engine-unspecific and media-dependent aspects such as paper size belong to industrial. But in a sense, I only suggest the "facsimile" class in order to dismiss it as not being relevant to ODF/OOXML documents. So the ODF Plugfest's functional grouping make sense.
The other thing that struck out was that Alex Brown has released an open source Schematron validation library, which I was not expecting. The library is called Probotron and looking at the website Alex is planning a .NET and a high performance version to go along with the initial Java version. (Michael Kay of course has lead the way with his dual platform availabilty of SAXON.)
He has a blog entry ODF Forensics on his Office-o-tron ODF validator, which uses Norm Walsh's open source XProc processor. I see he writes that it is uses Jing (RELAX NG) to validate rather than Schematron.
The latest release of the Schematron skeleton for XSLT2 code (see Schematron 2009) has an experimental features, multi-document patterns, which I added to support XML-in-ZIP formats better.
While online validation using Schematron certainly works with small documents and for high-value documents (indeed, my company Topologi has a servlet product for this), there seem to have been scaling issues: large documents take too much memory to move and load at the server, or the wait until the document loads into a DOM or whatever means that validation messages come late. I have written on a few possibility for optimizing Schematron for faster response over the last few years, so it will be interesting to see what Alex has in mind for his Probotron-HP.
(Actually, recently I have been working through various issues on the question: to what extent can Schematron be parallelized? Of course, there is a very simple high-level parallelization available: because patterns don't interact, they can be performed on separate threads/machines or interleaved. Actually, if you had a map-reduce multi-node system and were interested in that kind of eager parallelism, each rule (and indeed, each different assertion) could be farmed to a separate processor, with Schematron's lexical priority for rules within a pattern performed after validation (or after context matching) to exclude spurious rules, The ISO Schematron standard explicitly avoids order dependencies either on the order nodes must be visited or the order in which the document must be validated in order to make it easy to parallelize. )