I don't claim to be any kind of spreadsheet expert. Documents, publishing, formatting, text processing, schemas, validation, metrics, internationalization: I'm your man. Spreadsheets: see someone else. But I've been trying to figure out OpenFormula recently, so please see this blog as a request for confirmation rather than any kind of authoritative tutorial!
During the OOXML standardization process, particularly preparations for the BRM, I was impressed by how mission-critical spreadsheets are to industry, but also how users of spreadsheet applications have been like lambs to the slaughter: issues that programmers get taught in their first months, such as accuracy, precision, the nature of floating-point arithmetic, are swept under the carpet by spreadsheet developers. Now it may well be that their users are too unsophisticated to understand these kinds of things, but if the user is making mission-critical calculations where these issues are relevant, that user deserves to be educated. And education relies on the information being public.
OpenFormula is an new formula language for spreadsheets, that has come out of the ODF effort at OASIS Open. It has quite few advantages over the Excel spreadsheet language, for example a proper library mechanism. It is supposed to be part of ODF 1.2, but is still at late draft stage.
However, it is not syntactically compatible with the Excel formula syntax of IS29500. (In the notes to the most recent draft, it has a note (my emphasis)
This specification is derived from actual practice in industry. It was especially influenced by the OpenOffice.org exchange syntax and by the semantics of Microsoft Excel, but many other spreadsheet implementations were considered including (alphabetically) Corel's WordPerfect suite, Document To Go's Sheet to Go, GNOME Gnumeric, IBM/Lotus 1-2-3, KOffice's Kspread, and WikiCalc.
The first thing to realize is that OpenFormula actually defines an exchange formula language which has explicit delimiters, but also allows (and partly defines) application-specific user interface languages, which allows spaces and other delimiters. This is the way of working around the problem that existing applications have different features and slightly different syntaxes. The separation between formal common interchange notations and friendly front-ends is not new: the browser address bar is a good example, but it is also how localization works (in Excel, the browser in a Spanish locale might use the Spanish for SUM while the file is saved using the standard English name.)
The current OpenFormula draft s5.8 says References always begin with '['. For example in OpenFormula:
In IS29500 (OOXML) it seems that there are two different notations for references floating about: A1 and R1C1. An A1 reference would be
An R1C1 reference would be
So I guess that decoupling the user-interface from the exchange format will mean that the user interface-formats of the various systems never need to converge on using a standard syntax. They presumably can just tack on the extra bits (in as conforming a way as practical) to support the OpenFormula feature set, without disrupting legacy documents by changing syntax. But most spreadsheets, as I understand it, accept something more like the Lotus123 syntax as modified by Excel.
However, it is interesting to consider what will happen if there are OpenDocument 1.0 or 1.1 documents that do not use the exchange syntax defined by ODF. These will obviously break, or require that the importing system have some good error-correction to handle the legacy case.
If, for example, the upcoming service pack of Office 2008 exports spreadsheets using the Excel syntax not the ODF exchange syntax (which OpenFormula is based on) it would clearly cause trouble. The formula syntax in IS26300 is normative, and an ODF spreadsheet that used IS29500 syntax when saved, even if it didn't follow full Open Formula, would not be conforming, as far as I can see.
This is another case where I think allowing plurality is the best policy. However, I acknowledge the problem is that large worksheets may have large numbers of formulae, and duplicating them would add to valuable load time and file size, let alone round-tripping formulae you don't understand.
What is required? All spreadsheet implementations need to provide the feature set that supports Open Formula and OOXML: all those functions and library referencing. All spreadsheet implementations that generate ODF need to generate OpenFormula and nothing else inside the ODF. All spreadsheet implementations that generate OOXML need to generate IS29500 syntax and nothing else inside the OOXML. Pollution of ODF or OOXML documents with non-conforming formula should not be tolerated in the name of supporting legacy: these are only a couple of years or so old, for goodness sake. The SC34 group on reconciling ODF and OOXML need to liaise with OASIS and ECMA to try to reduce gratuitous differences. (And, this should include allowing plural syntaxes in OOXML and ODF files.)
By the way: I see that the OpenFormula specification is (if you remove the frontmatter and the references) about 405 pages (perhaps a little less if you remove the editorial notes). The corresponding section in IS29500 is 406 pages.