Supporting degradation: towards a workable Open Packaging standard

How Namespace Relations could allow better fallback and graceful degradation

By Rick Jelliffe
May 31, 2009 | Comments: 5

One of the most interesting areas opening up in the last few years in markup has been the increasing adoption of XML-in-ZIP (XIZ) file formats: interesting in particular because it opens up many doors for adressing versioning and fallback issues.

We need a standard Open Packaging technology. That this packaging, writ large, is something where there is wide agreement on a base and a need for consolidation and layering above this, seems exactly the fertile ground for a standard: enabling technologies without much of the competitive angle that so pollutes the waters for application standards.

I think that for many people involved in particular XIZ standards, the packaging is boring and distracting: readers are invited to laugh at the dismissal that it is an intellectual tidiness fetish which demonstrates the problem: a packaging format coming out of ODF (e.g. the stagnated ODF 1.2 part 2?) will be limited to solving ODF problems, just as the much more full-featured OPC (IS29500 Part 2 Open Packaging Conventions) which was formulated to solve Microsofts particular issues with Office and XPS.

For a survey of the current technology, see from the beginning of this year (2009), see Packaging formats of famous application/*+zip. In that mention that there is quite an agreement on the basics, but quite a plurality after that.

There are a lot of different and common issues floating around which really should belong in a stack or layered module attached to the standard packaging format: encryption, versioning, referencing, extensibility, fallback, and so on all fit into these.

IS29500's OPC (Open Packaging Conventions) is of course the current leading example in this, and it would be the obvious contender for sourcing technology for such a standard.

Recently thinking through some issues with ODF and OOXML maintenance (and some other standards), I think the list of technologies that are needed in order to provide an adequate base can be filled out a little better.

  1. Storage and addressing: ZIP, deflate, MIME type application/*+zip, part: URL scheme, UTF-8, XML, content-types
  2. Metadata: Dublin Core, RDFa
  3. Signatures: basic - W3C DSIG, + special purpose: ??
  4. Encryption: basic - W3C ENC, + special purpose: ??
  5. Part referencing: basic - URL, + advanced: OPC relationships (linkbase of URLs) (IS 29500 Part 2) + OASIS XML Catalogs
  6. Markup Compatability and Extensibility: MCE (IS29500 Part 3)
  7. Document localization, profile and dialect support: ISO/IEC DSRL
  8. Multiple document support: ???
  9. Namespace fallback: see below

The last item is something new. The cost of buying into standards is that you lose agility: it is very difficult to make standards that allow a plurality of different functional features and graceful degradation, which is the hallmark of office software especially word processors, but which don't thereby become so vague and mushy as to be useless. In terms of my Classes of Fidelity for Document Applications a standard written for the exchange level of fidelity may easily not be suitable for those requiring the industrial level. I recall being very impressed by MathML's David Carlisle who once wrote to me that in his mind MathML was intended specifically to be an exchange format, with applications free to develop their own native formats for all the bells and whistles and ideosyncracies that the particular application supported.

While I am a fan of MCE (see Safe Plurality: can it be done using OOXML's Markup Compatibility and Extensions mechanism?), which allows alternative content selection like SMIL's switch and builds on SOAP's MustUnderstand method, I think it still leaves a gap that needs be usefully filled in order to meet long-term versioning and short-term deployment requirements.

The problem I want to solve is this: fallback and graceful degradation. What happens when an application is presented with an file that is much older or newer than itself which has XML using an unknown namespace, but where the application does know a similar namespaces?

Consider the case of ODFs SVG dialect. It uses ODF SVG names and ideas, but because it has a few additions and subtractions, it uses its own namespace. For better document exchange, I would prefer my SVG system to be able to make a stab at rendering the ODF graphic, even if it gave me a warning "This may be crap or it may be perfect".

Graceful degradation is a form of interoperability. It would be paradoxical if our quest for guaranteed high-fidelity interoperability made us forget about this other form: the HTML kind of interoperability rather than the SGML/XML kind.

The current MCE mechanism addresses this problem by allowing alternatives. The file is larger, the application selects the best fit. But it does not address what happens when the consuming application does not know the namespace, which is likely to happen over time, or when one group needs to bend a standard in some way to suit their requirements.

ISO/IEC DSDL's Document Schema Renaming Language (Martin Bryan's excellent work now taken over by Murata Makoto) DSRL looks very applicable here: it lets you say Map namespace X1 to namespace X; map element fred to element FRED; map attribute value "pts" to "pt" and so on. But how does the application know which DSRL mapping to apply?

Namespace Relations

I think we are missing, or have now arrived at the stage where we need, a way to declare relationships between different namespaces. I suggest that all we need it a little language like the following:


element namespaces {
element namespace {
attribute iri { xs:anyURI },
(
element subset { attribute iri { xs:anyURI }} |
element superset { attribute iri { xs:anyURI }} |
element dialect { attribute iri { xs:anyURI }}
)+
}+
}

For example


<namespace iri="urn:oasis:names:tc:opendocument:xmlns: svg-compatible:1.0">
<dialect iri="http://www.w3.org/2000/svg" />
</namespace>

This information, provided as part of a document or application or even downloadable from an repository, would provide the consuming application with enough information to figure out whether to rename the data (using DSRL or just a namespace change) and to attempt to laxly load with what kind of user warning.

And it would support a more flexible approach to namespaces. At the moment, namespaces have the typical XML disease of utterly no support for versioning. It is not just the name, it is also that our XML software cannot handle remapping namespaces on the fly. It would give standards-makers a powerful new tool, as we move from the early period of innovation to one of consolidation and maintenance.

Early on in XML, I was on the side that fought and won the discussions on whether the namespace was a schema or whether it related to more general semantics. Just as HTML can have multiple DTDs, a namespace could be used in multiple schemas, both profiles and language updates. XSLT's update to XSLT 2 is a case in point: it keeps the old namespace with the same general semantics.

So you use a namespace change where there is a significant change in semantics or syntax. But this gives a dilemma: if we change namespaces readily, our old applications cannot read the new data; if we change namespaces only at the last resort, more of our applications will fail due to receiving unexpected values.

In some case, of course you want Draconian handling. But just as one cannot say that everyone needs graceful degradatation, one cannot say that no-one needs it. It is a missing piece of the puzzle.


You might also be interested in:

5 Comments

Consider the case of ODFs SVG dialect. It uses ODF names and ideas,

I think you meant SVG names and ideas?

Doh! Thanks. (I hope I got what you said right David...please correct if my memory is wrong.)

Well, the actual namespace in ODF is
urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0

Although it is given prefix and binding "svg:" in the exposition and in the schema, it is a separate namespace and therefore any similarity between its local names and those of SVG, living or dead, is entirely coincidental. Except to the degree that the corresponding details are defined.

This creates some duty to explain what elements are in the ODF svg-compatible:1.0 namespace and what there definitions are, at least to rigorously delimiting the degree of correspondence to the SVG thingy with the same local name, if that is what is intended.

For me, the use of an svg: prefix bound to a non-SVG namespace is unfortunate misdirection. There's no problem technically, but in terms of the exposition readers may be led to understand more than what is technically the case.

Others have commented on whether or not svg-compatible is actually SVG compatible. I have no assessment to make about that. To the extent that there are unfortunate deviations, I trust that remedies will be identified in time for ODF 1.2.

orcmid: It is not on the books for ODF 1.2 AFAIK? It would cause a delay and I was under the impression that Sun engineers were on the record (at least a couple of years ago) as opposing harmonization due to their priorities.

I raised the SVG dialect issue for the ODF NG requirements. A contact of the W3C SVG WG then also made a post saying that it could be a two-way street: that if the ODF dialect had features that SVG 1.2 did not support, they would be willing to consider enhancing SVG (1.3?) to cope.

But I mentioned it here merely as an example of a larger issue, which how to support graceful degradation, of supersets, dialects and other version.

I could understand that you could get into a mess if the subset uses a different namespace to the superset. What would seem ideal to me is if the schema languages had something like an 'intent' element where, in effect, you could specify that "there may be more to the document than is in this schema because this schema defines a subset and for the superset (with the same namespace || with a different namespace, ie 'xyz') look here". Likewise the supserset schema could be either left as is or have a corresponding note to say, in effect, either (to close the schema to any extension) "this is a supset schema and an instance with this namespace cannot include anything not defined in this schema but subset schemas may exist with (the same namespace || these other namespaces)" or (allowing further minor release extensions) "this is intended as a superset schema but extensions are allowed with the same namespace (different versions) and subsets may be defined with (the same || other namespaces)". Doing this outside of the schema seems almost futile since there could be any number of such external specifications (effectively doing no better than a prose conformance profile with or without any corresponding [test] assertions like Schematron or Test Assertion Markup Language). What seems ideal would be to have the mechanism included in the schema languages themselves so that the 'intent' of the schema can be specified with the provenance of the schema authors.

News Topics

Recommended for You

Got a Question?