Trying to figure out where Open Formula fits in

By Rick Jelliffe
October 24, 2008 | Comments: 4

I don't claim to be any kind of spreadsheet expert. Documents, publishing, formatting, text processing, schemas, validation, metrics, internationalization: I'm your man. Spreadsheets: see someone else. But I've been trying to figure out OpenFormula recently, so please see this blog as a request for confirmation rather than any kind of authoritative tutorial!

During the OOXML standardization process, particularly preparations for the BRM, I was impressed by how mission-critical spreadsheets are to industry, but also how users of spreadsheet applications have been like lambs to the slaughter: issues that programmers get taught in their first months, such as accuracy, precision, the nature of floating-point arithmetic, are swept under the carpet by spreadsheet developers. Now it may well be that their users are too unsophisticated to understand these kinds of things, but if the user is making mission-critical calculations where these issues are relevant, that user deserves to be educated. And education relies on the information being public.

OpenFormula is an new formula language for spreadsheets, that has come out of the ODF effort at OASIS Open. It has quite few advantages over the Excel spreadsheet language, for example a proper library mechanism. It is supposed to be part of ODF 1.2, but is still at late draft stage.

However, it is not syntactically compatible with the Excel formula syntax of IS29500. (In the notes to the most recent draft, it has a note (my emphasis)

This specification is derived from actual practice in industry. It was especially influenced by the OpenOffice.org exchange syntax and by the semantics of Microsoft Excel, but many other spreadsheet implementations were considered including (alphabetically) Corel's WordPerfect suite, Document To Go's Sheet to Go, GNOME Gnumeric, IBM/Lotus 1-2-3, KOffice's Kspread, and WikiCalc.

The first thing to realize is that OpenFormula actually defines an exchange formula language which has explicit delimiters, but also allows (and partly defines) application-specific user interface languages, which allows spaces and other delimiters. This is the way of working around the problem that existing applications have different features and slightly different syntaxes. The separation between formal common interchange notations and friendly front-ends is not new: the browser address bar is a good example, but it is also how localization works (in Excel, the browser in a Spanish locale might use the Spanish for SUM while the file is saved using the standard English name.)

The current OpenFormula draft s5.8 says References always begin with '['. For example in OpenFormula: =[.B4]+[.B5]

In IS29500 (OOXML) it seems that there are two different notations for references floating about: A1 and R1C1. An A1 reference would be =$B$4+$B$5

An R1C1 reference would be =R[B]C[4]+R[B]C[5]

So I guess that decoupling the user-interface from the exchange format will mean that the user interface-formats of the various systems never need to converge on using a standard syntax. They presumably can just tack on the extra bits (in as conforming a way as practical) to support the OpenFormula feature set, without disrupting legacy documents by changing syntax. But most spreadsheets, as I understand it, accept something more like the Lotus123 syntax as modified by Excel.

However, it is interesting to consider what will happen if there are OpenDocument 1.0 or 1.1 documents that do not use the exchange syntax defined by ODF. These will obviously break, or require that the importing system have some good error-correction to handle the legacy case.

If, for example, the upcoming service pack of Office 2008 exports spreadsheets using the Excel syntax not the ODF exchange syntax (which OpenFormula is based on) it would clearly cause trouble. The formula syntax in IS26300 is normative, and an ODF spreadsheet that used IS29500 syntax when saved, even if it didn't follow full Open Formula, would not be conforming, as far as I can see.

This is another case where I think allowing plurality is the best policy. However, I acknowledge the problem is that large worksheets may have large numbers of formulae, and duplicating them would add to valuable load time and file size, let alone round-tripping formulae you don't understand.

What is required? All spreadsheet implementations need to provide the feature set that supports Open Formula and OOXML: all those functions and library referencing. All spreadsheet implementations that generate ODF need to generate OpenFormula and nothing else inside the ODF. All spreadsheet implementations that generate OOXML need to generate IS29500 syntax and nothing else inside the OOXML. Pollution of ODF or OOXML documents with non-conforming formula should not be tolerated in the name of supporting legacy: these are only a couple of years or so old, for goodness sake. The SC34 group on reconciling ODF and OOXML need to liaise with OASIS and ECMA to try to reduce gratuitous differences. (And, this should include allowing plural syntaxes in OOXML and ODF files.)

By the way: I see that the OpenFormula specification is (if you remove the frontmatter and the references) about 405 pages (perhaps a little less if you remove the editorial notes). The corresponding section in IS29500 is 406 pages.


You might also be interested in:

4 Comments

Hi, Rick,

You've got a mistaken premise on one point, that use of Excel formulas in Microsoft's implementation of ODF 1.1 would be non-conforming with ISO/IEC:26300 ODF.

The ODF conformance section (all adopted versions) states in relevant part:

"There are no rules regarding the elements and attributes that actually have to be supported by conforming applications, except that applications *should not* use foreign elements and attributes for features defined in the OpenDocument schema."

"Should not" is only a recommendation under the ISO/IEC Guidelines definitions of requirement keywords incorporated by reference in ODF section 1.2. An implementation that does not implement a recommendation does not thereby become non-conformant.

So long as Microsoft follows the ODF packaging conventions and includes an OASIS office XML namespace header line, the rest of the document could be marked up in OOXML and still be conformant ODF.

(This is not to argue that OOXML is any better in such aspects. E.g., the conformance section of the OOXML draft currently being circulated among NBs is largely a preposterous essay on why conformance requirements should not be defined. Neither ODF nor OOXML comply with the JTC 1 Directives requirement that international standards "specify clearly and unambiguously the conformity requirements that are essential to achieve the interoperability." http://isotc.iso.org/livelink/livelink/fetch/2000/2489/186491/186605/AnnexI.html)

I agree with you about the practical interoperability difficulties posed if Microsoft does use Excel markup rather than OpenFormula markup in its ODF implementation. But it would be a largely avoidable interoperability barrier, not an issue of conformance.

By the way, IBM's Bob Sutor has also discussed this issue. http://www.sutor.com/newsite/blog-open/?p=2671

Best regards

I would worry more about Openformula defining certains formula's slightly different from the way that have been implemented in tons of applications at the moment. Some implementations may still use their current Excel like function for mapping functions that are actually not fully compatible anymore (other ranges, other limits, other base values). this could easily lead to diffenent interpretations of the same formula's.

hAl: This is why there needs to be really clear labelling. When an Excell-style formula is imported into OpenDocument, as well as changing the notation to OpenForumula, the function names need to be in a proper namespace or library for the OOXML fomulas.

Similarly, OOXML needs a proper library/namespace mechanism with prefixes so that all the OpenFormula functions are used.

I completely staggers me that there are people goind around saying these things make no difference: that all we need to do is agree on something and not worry about how we get there or how to get there or how to cope with mixed implementations. Explicit labelling is almost always the answer: formulas should state which standard syntax they use, functions should state which standard or extension library they belong to.

Marbux: Yes, but I don't see anywhere that is says that implementations can make up their own notations, datatypes or values of ODF attributes and elements.

On my reading, I don't see where an implementatin is free to ignore the syntax rules for the formulae and substititute something else.

That an implementation does not have to support any of the defined elements and attributes is not at all the same thing as saying they can ignore the spec, and use the same element name with non-standard contents.

News Topics

Recommended for You

Got a Question?