In my blog last year Is ODF the new RTF or the new .DOC? Can it be both? Do we need either? I raised the question of whether ODF would replace RTF or DOC. I think this issue has come back with a bang with the release of Office 2007 SP2, and I'd like to give another pointer to it for readers who missed it first time around.
Yesterday I wrote a little exploration of SP2, and framed it with a model of four different classes of fidelity, that an application might provide when opening an office document. I'd like to generalize this model to be more useful for evaluating other document applications.
|Fidelity||Characteristics||Formats (rough)||Work involved getting to next class|
|Raw||Deep information in file used not presentation information No media dependencies. Draft quality display easy||text/plain
text/xml with minimal assembly of content
|Need to format data|
|Exchange||Generic capabilities: Rich text, formulas No media dependencies No decorations||text/html w CSS2
|Need to adjust for media, page, decoration and autogenerated media, stylesheets|
|Industrial||All media dependencies honoured. All decorations and auto-generated text present. Styles and structures retained.||ODF, ISO/IEC OOXML||Need to adjust all styles, sizes, breaks etc. in stylesheet and actual data|
|Facsimile||Looks and behaves identically, though user agent differences may exist||ISO/IEC PDF|
ODF (and OOXML) support will never be in that fascimile class, except round-tripping to the same application if that application uses ODF as its native format (or, at least, if the feature set of the application matches that of the version of ODF.)
And belonging to a class is not necessary a bad thing. For example, SGML was specifically designed to cope with documents that may be required in fifty years time and for re-targeting: 'Raw' fidelity in the sense above is all that is useful for a document like that, was the idea, because in reality the document would need to be reprocessed for their technology.
I don't want to imply that there is a simple supersetting of building of one class to the next. A spreadsheet with initial data and formula but no caching of calculated values would be fine for 'exchange' (i.e. to other spreadsheet applications) and 'industrial' (i.e. to substituting spreadsheet applications with the same bells and whistles) but would not be suitable for an application that needed the 'raw' data (i.e. because it was not a spreadsheet application that could interpret the formula) nor 'facsimile' applications (for the same reason). Similarly, a PDF file without structured tags would be suitable for a 'facsimile' but not for the other classes. If we don't recognize these differences, we won't get the kind of interoperability we thought we were getting from the standard formats.
The errors I found yesterday in SP2 seemed to belong to the Industrial class (I called it 'publishing' in that blog but I want to generalize.) It would meet the 'exchange' class, but not everything in the 'industrial' class (whether this defect is important for a particular use case is of course a different issue.)
(In this light, we could re-cast the big blue marketing machine's comments on SP2's formula support to be that SP2 spreadsheets only provide Raw level of fidelity where it could have (by loading and generating the draft Open Formula that his product supports) made the 'exchange' level, suitable for SOHO users (who have small enough spreadsheets that they can spot a problem.) And the opposing view which would be that adoption of real Open Formula would allow exchange quality, so until then we only have Raw in practice. And my middle view, which invokes Postel's principle and letting the user decide what is appropriate.)
The class model also lets us classify the problems we may find in a standard. A poor or missing description of a 'Raw' level issue is a major problem. A missing description of a 'exchange' feature means that only 'draft' level applications may work as desired. Similarly, for implementations, 'raw', 'exchange', 'industrial' and 'facsimile' classes indicate the relative priorities for fixing the problem.
For procurement, I would say that a prudent policy would be to specify the class of fidelity required for a document application. When receiving a document, how much work should the recipient have to do to get the necessary fidelity? (For example, someone receiving spreadsheet data which they are going to use for their own ends requires only 'Raw' fidelity; someone who needs to re-produce legislation probably needs facsimile fidelity.)
Indeed, one procurement tactic might be, where ultimately 'Industrial' class fidelity is required in a heterogenous environment, to, say, start off requiring 'raw' support of ODF now, 'exchange' support next year, and 'industrial' the year after. That will encourage developers without penalizing new players with too high a bar.
OASIS ODF TC has some kind of conformance and testing wing at work, but it is not at all clear that they will deliver anything in this kind of area. Without targetting these classes, ODF's breezy conformance requirements means that ODF conforment software can deliver vastly different kinds of fidelity, yet still accord to the letter of the law (and, indeed, to the spirit of the ODF spec, which allows so many holes) which will cause frustration all-around.