Classes of Fidelity for Document Applications

By Rick Jelliffe
May 27, 2009 | Comments: 2

In my blog last year Is ODF the new RTF or the new .DOC? Can it be both? Do we need either? I raised the question of whether ODF would replace RTF or DOC. I think this issue has come back with a bang with the release of Office 2007 SP2, and I'd like to give another pointer to it for readers who missed it first time around.

Yesterday I wrote a little exploration of SP2, and framed it with a model of four different classes of fidelity, that an application might provide when opening an office document. I'd like to generalize this model to be more useful for evaluating other document applications.

Fidelity Characteristics Formats (rough) Work involved getting to next class
Raw Deep information in file used not presentation information No media dependencies. Draft quality display easy text/plain
CSV
text/xml with minimal assembly of content
Need to format data
Exchange Generic capabilities: Rich text, formulas No media dependencies No decorations text/html w CSS2
RTF
Need to adjust for media, page, decoration and autogenerated media, stylesheets
Industrial All media dependencies honoured. All decorations and auto-generated text present. Styles and structures retained. ODF, ISO/IEC OOXML Need to adjust all styles, sizes, breaks etc. in stylesheet and actual data
Facsimile Looks and behaves identically, though user agent differences may exist ISO/IEC PDF

ODF (and OOXML) support will never be in that fascimile class, except round-tripping to the same application if that application uses ODF as its native format (or, at least, if the feature set of the application matches that of the version of ODF.)

And belonging to a class is not necessary a bad thing. For example, SGML was specifically designed to cope with documents that may be required in fifty years time and for re-targeting: 'Raw' fidelity in the sense above is all that is useful for a document like that, was the idea, because in reality the document would need to be reprocessed for their technology.

I don't want to imply that there is a simple supersetting of building of one class to the next. A spreadsheet with initial data and formula but no caching of calculated values would be fine for 'exchange' (i.e. to other spreadsheet applications) and 'industrial' (i.e. to substituting spreadsheet applications with the same bells and whistles) but would not be suitable for an application that needed the 'raw' data (i.e. because it was not a spreadsheet application that could interpret the formula) nor 'facsimile' applications (for the same reason). Similarly, a PDF file without structured tags would be suitable for a 'facsimile' but not for the other classes. If we don't recognize these differences, we won't get the kind of interoperability we thought we were getting from the standard formats.

The errors I found yesterday in SP2 seemed to belong to the Industrial class (I called it 'publishing' in that blog but I want to generalize.) It would meet the 'exchange' class, but not everything in the 'industrial' class (whether this defect is important for a particular use case is of course a different issue.)

(In this light, we could re-cast the big blue marketing machine's comments on SP2's formula support to be that SP2 spreadsheets only provide Raw level of fidelity where it could have (by loading and generating the draft Open Formula that his product supports) made the 'exchange' level, suitable for SOHO users (who have small enough spreadsheets that they can spot a problem.) And the opposing view which would be that adoption of real Open Formula would allow exchange quality, so until then we only have Raw in practice. And my middle view, which invokes Postel's principle and letting the user decide what is appropriate.)

The class model also lets us classify the problems we may find in a standard. A poor or missing description of a 'Raw' level issue is a major problem. A missing description of a 'exchange' feature means that only 'draft' level applications may work as desired. Similarly, for implementations, 'raw', 'exchange', 'industrial' and 'facsimile' classes indicate the relative priorities for fixing the problem.

For procurement, I would say that a prudent policy would be to specify the class of fidelity required for a document application. When receiving a document, how much work should the recipient have to do to get the necessary fidelity? (For example, someone receiving spreadsheet data which they are going to use for their own ends requires only 'Raw' fidelity; someone who needs to re-produce legislation probably needs facsimile fidelity.)

Indeed, one procurement tactic might be, where ultimately 'Industrial' class fidelity is required in a heterogenous environment, to, say, start off requiring 'raw' support of ODF now, 'exchange' support next year, and 'industrial' the year after. That will encourage developers without penalizing new players with too high a bar.

OASIS ODF TC has some kind of conformance and testing wing at work, but it is not at all clear that they will deliver anything in this kind of area. Without targetting these classes, ODF's breezy conformance requirements means that ODF conforment software can deliver vastly different kinds of fidelity, yet still accord to the letter of the law (and, indeed, to the spirit of the ODF spec, which allows so many holes) which will cause frustration all-around.


You might also be interested in:

2 Comments

Your suggestion for procurement entities to require phased achievement of fidelity classes parallels some of my own thinking on the subject, although I think one-year increments are overly ambitious.

One problem I see with the concept is that the ODF specification itself does not specify the conformity requirements essential to achieve interoperability, let alone classes within the interoperability fidelity spectrum. So the phased approach would need to encompass similar phases for the ODF TC to get the spec aligned with the fidelity class market requirements, with the implementation deadlines trailing the deadlines for the TC.

Another problem I see might be encapsulated by the question: "What if government procurement officials require ODF document exchange formats and the implementing ODF vendors refuse to supply what government adopts as the procurement requirements?"

This has already happened. IDABC and 27 E.U. Member States laid down their interoperability requirements in early 2007 at the Open Document Exchange Formats Workshop.

But the ODF TC has been utterly non-responsive and rather than delivering on the ODEF Workshop insistence that "interoperability come[] from data not from applications," Big Blue continues to push for application-level interoperability work rather than specifying the conformity requirements essential to achieve interoperability" in the ODF specification.

And it's the big vendor boycott of such clear procurement signals that has reluctantly brought me to conclude that procurement requirements alone are insufficient incentives for the big vendors. I think that only the competition regulators have the needed power to establish a phased schedule for repairing the ODF specification and for ensuring conformant, interoperable implementation by the big vendors.

To deal with such issues, I have proposed that the Obama Administration establish an Interagency Task Force for Information Technology Standards Development Reform, using existing regulatory authority. I'd be interested in your reaction to that proposal.

Marbux: The Workshop link is a really good one, thanks for that.

I certainly agree that one important aspect of the job of standards bodies is to make standards in such a form that they can serve government procurement needs well.

In fact, this has long been a major concern of ISO/IEC JTC1 SC34 Document Description and Processing Languages, which concentrated on developing "enabling" technologies (rather than particular exchange formats) to provide a better vocabulary of technologies for procurement people to use.

You see part of the problem is being unable to specify everything we need concisely and unambiguously: we cannot even start until we have the appropriate tools for the job, and just what these tools are emerges at a slower rate than we would prefer. This is why in SC34 WG1, for example we have been concentrating on producing new schema languages that can capture constraints and requirements in much more objective and testable forms. My own work on Schematron is starting to pay off: HTML5 is being formulated in a quite Schematron-friendly fashion for example.

Even if we had these tools, it would still be a big slow job.

The more vague and loopholy a standard is, the less effective it will meet the needs of people who want 100% interoperability and application substitutability. And even if it were perfectly specified, there is still all the marketing and development money that Microsoft can throw at its products: it has the deep pockets of someone who sees its office suites as a profit center, while some of its FOSS competitors see their office suites as loss leaders to sell hardware or services: they have will have shallower pockets.

So Microsoft can certainly sit back and wait to outstare or outspend its rivals, whether or not ODF is mandated.

I think what is extremely necessary is for governments, when they mandate open document formats such as PDF, ODF and OOXML, is to mandate the most recent (i.e. most recent, mature and tightly specified) versions of the standards, and the smallest practical subsets (profiles) of those. For 2010, this means ODF 1.2 and OOXML Strict in particular. Procurers should preannounce that this will be their intention, and they should provide necessary escape clauses and plan Bs and so on.

But governments need to be serious. A government that does not have a representative on the ODF and/or OOXML working groups is simply not serious about data exchange.

News Topics

Recommended for You

Got a Question?