Fake real-time blog from Document Interoperability Initiative 2 at Redmond

By Rick Jelliffe
October 29, 2008 | Comments: 2

I wasn't at DII1 or DII2. But I always like to flip through the presentation slides of conferences to get a feel for their content, even though I be China watching.

Application Architectures

A few years ago Microsoft had a very interesting paper on their site which gave a rationale for the feature strategies for their product lines. I have not been able to find this paper in the last year or so.

It said, basically, that MS needed to rationalize its products along certain lines based on the kinds of document being produced. Aligning these application archetypes with the product line, using master product or suite: Visual Studio for programs, Office for office applications, Windows Media for media, IE for web browsing.

In practice, it meant that unless (the champions for) an application like Visio could justify that it represented some kind of top-level document type different from existing suites, it would be destined in the medium term to be folded into a suite. You can see that it is now branded Microsoft Office Visio 2007, even though it is not actually bundled in Office! The development of DrawingML in Office might be an indication of Office's diagramming infrastructure being ramped up for such a move, for example.

The dropping of FrontPage as a product fits into this top-level consolidation. In a previous blog, I asked why there should be any difference between a file and a website, for an editor application, now that we use the XML-in-ZIP formats? (So where does something like RSS Bandit fit?)

Document Architectures

Anyway, this is all old news. What was interesting in the DII slides was that for each of Word, Powerpoint and Excel there is a discussion of document archetypes. This seems to be the same approach as the application archetypes applied inside the existing applications.

Most of the architypes are fairly uninteresting: for Word, for example they are

  • Articles
  • Essays
  • Books
  • Fliers
  • Labels
  • Legal Documents
  • Letters
  • Meeting Notes
  • Memos
  • Newsletters/Notes
  • Outlines
  • Proposals
  • Research Papers/Reports
  • Résumés
  • Structured Document/Form

Now it may be that these are just use case scenarios, or classifications for their templates. It certainly is a good indication for the limits of the scope of Word's design. But it will also be interesting to see whether, ultimately, products get developed with GUIs adjusted for each of the document types. Already Word has a stripped-down blogging mode, for example...but I don't see blogs on that list.

Some other of the archetypes are surprising, unless you put them in the light of the application archetype. The presentation archetypes are particularly interesting in this regard, include note-taking, menu items, and a diploma: a million miles away from Slide shows! It seems that PowerPoint is destined to take on more traditional DTP functionality. And presumably it would also be the logical home for WYSYWYG animation creation, as a poor man's Flash, ramping up the basic transition and animation capabilities.

XML and data exchange angle

What has this got to do with XML?

The demand for a particular kind of standardized interoperability, where large-suite formats (namely ODF, OOXML, PDF) are kitted out in their Sunday best and stamped with some ISO or Consortium number, is based on what I suggest is a logically flawed view of standards. In effect, it says that the document format should follow the requirements of a particular application (or class of applications, or application archetype, it is all the same here): it is a kind of feature freeze.

However, applications are continually shifting ground and changing: perhaps slowly like crocodiles occupying each others territories. But it is real. It will be a challenge for standards to cope with this change.

For example, Sun's StarOffice has a PDF Import feature. It opens a PDF file inside the StarOffice drawing package, with each line as separate text box. (I have been beta testing StarOffice 9, but there is a media embargo for another three weeks or so.) This would have been an absolute life-saver many times in the past. PDF Import stretches the idea of what a drawing application handles. And as PDF moves to tagged PDF, it will be interesting to see whether the drawing packages get some ability to connect lines better, or even do structure editing...it is not an unthinkable direction.

The other reason that these document archetypes are interesting, from the XML POV, is whether they themselves are things that would be useful to standardize.

The traditional SGML approach to documents was to make up your own DTD for your document type. Re-use some common components, for sure, but basically there was little cooperation. Everyone needs something slightly different. Model and capture your date first, then figure out how to present it.

The other extreme was the DTP and WP oriented approach: you are given a minimal canned set of structures, and you just use those. This approach had a revolution early on, as the need for styles asserted itself (particularly in relation to large-scale document sets such as websites), and more recently with a second layer as themes (and skins) have come into the mainstream.

And now we have these document archetypes, which sit plumb in the middle. It is not hard to imagine procurement standards in terms of archetypes and themes and semantic tags, for example: the software must support legal documents and books (as defined by standard XXXX), readily switchable to different color-blindness arrangements (as defined by YYYY), and must allow arbitrary labels on paragraphs (from the list provided by ZZZZ).

This is in the realm of interoperability by linking or mapping (or round-tripping) rather than by adopting a format intended to be universal, which because it is therefore limited is doomed at being universal. (Old SGML hands: I suppose there is some connection between document archetypes and document architectures.)

For example, rather than having an email Schema and XML emails, or having emails in vanilla ODF or Open XML, we have emails in any format which have the items typical of that archetype labeled with agreed labels, perhaps each with different syntax to suit the format. (Rather like how Dublin Core can use elements or attributes or be inline: the syntax differences become an inconvenience not a roadblock.) And, of course, you can then use Schematron to validate these (e.g. think of the way you could validate using Schematron an email which is marked up in HTML, with predefined class attributes for To: and From: etc that are checked by the Schematron schema)

In the cases of ODF, OOXML and PDF, the formats all can (or can be extended to) support container elements (or class attributes) that could be added by user interaction as part of editing that document in an archetype-adjusted editing program. For example, a numbered section mechanism for an article/report editor mode with the appropriate elements, which was not available in other archetype editing modes. Or a multi-chapter GUI control (and links) for a book mode.

Could this piggybacking of simple structures on top of templates be a new sweet spot for structured markup?

One indication that document archetypes might play a role in more of our work lives can be seen in a slide from that says that the DII wants to Facilitate creation of a library of templates optimized for translation between popular formats (OXML, ODF, UOF, etc.) Now this optimization may just be the kinds of issues I raised in a blog last year: Castoff hints: rethinking interoperability and fidelity which are to do with using design to help cope with the differences in fonts, features and algorithms in different applications to help make pages look the same when sent to different systems. But it does open the door for a fresh way at looking at how to get workable interoperability without dumbing down or freezing formats, and with the chance of higher-order markup.

You might also be interested in:


Hi Rick,

Good food for thought. Regarding how we developed the archetypes covered at the workshop, they're essentially use-case scenarios based on the types of documents that we've seen from customers. I'd guess that the lack of a blogging category is simply because that use case doesn't typically manifest itself as "documents" on disk.

Hadn't heard that crocodile-territory metaphor before. Consider it stolen, or at least borrowed.


News Topics

Recommended for You

Got a Question?