Open Formula zero-ing in

Oh baby its a wild world...

By Rick Jelliffe
July 20, 2009 | Comments: 2

I was browsing the most recent 2009-05-08 draft of Open Formula (the spreadsheet formula language slated to accompany ODF 1.2).

What does it say about the kind of interoperability it is aiming to provide? Here are some of the relevant extracts, for a taster:

In s1.1 Introduction

The purpose of this specification is ensure that people can own their data, instead of having their data controlled by a vendor who controls the data format. Using OpenFormula allows document creators to change the office application they use, exchange formulas with others (who may use a different application), and access formulas far in the future, with confidence that the recalculated formulas in their documents will produce equivalent results if given equivalent inputs.

but then

1.Implementation-defined areas were specified based on an analysis of the differences between real applications, instead of over-constraining the specification (which could imply a requirement to use only one specific product). ... Different applications produce different error values in the same circumstance, and this specification explicitly permits certain kinds of variance (while making clear to document developers that there is such variance, and how to deal with it).

In S2 Conformance

Applications may implement subsets or supersets of this OpenFormula specification. An application shall only claim to conform to a given function, operator, or group if the application completely meets all of its requirements as defined in this specification. Applications may (and typically do) implement additional functions beyond those defined in this specification. Applications may support additional formula syntax, additional operations, additional optional parameters for functions, or make certain function parameters optional when they are required by this specification. Applications should clearly document their extensions in their user documentation, both online and paper, in a manner so users would be likely to be aware when they are using a non-standard extension.

but then

This specification's text is written as a description of the requirements of an implementing application. However, documents (data files) containing formulas can also comply or fail to comply with this specification. Documents with OpenFormula formulas may use subsets or supersets of OpenFormula. A document may reference a nonstandard function by name, or depend on implementation-defined behavior, or on semantics not guaranteed by this specification. Thus, this specification discusses what is required for a document to assert that it is a portable document. A portable document shall only depend on the capabilities defined in this specification, and shall not depend on undefined or implementation-defined behavior. A portable document shall only claim to conform to a given group if the document only depends on the capabilities of the given group.

From s2.1 Predefined Groups

To encourage interoperability, groups of functionality have been predefined. These groups make it easier to determine if a given application will be able to correctly process a given document. Acquirers can select applications depending on whether or not the application conforms to the group(s) necessary for the acquirer's circumstances. Document developers can choose to only use capabilities defined in certain groups when creating their documents.

There are also several predefined groups that include a number of common functions, where each group is a strict superset of previous groups:
  • Small: This group includes approximately 100 functions, and includes the basic functions that are very widely implemented in spreadsheet applications, even in resource-constrained environments. It does not require support for inline arrays, complex numbers, nor the reference union operator.
  • Medium: This group provides approximately 200 functions; many current desktop implementations meet or nearly meet the requirements of this group. It does not require support for inline arrays or complex numbers, but it does add the reference union operator.
  • Large: This group provides over 300 functions, and requires support for inline arrays, complex numbers, and the reference union operator.
  • Note that some applications may be able to read and write a document that states that it requires a larger group, because applications may partially implement a larger group.
From s2.2 Variances (Implementation-defined, Unspecified, and Behavioral Changes),
This specification is based on actual spreadsheet applications, and while there are many points of agreement, there are some variances between applications. In some cases this specification specifically does not specify a particular result, leaving some issues implementation-defined or undefined; developers of portable documents shall avoid implementation-defined and undefined semantics. Applications should document all implementation-defined decisions in a manner that the application users can obtain the information (e.g., in the application help for the relevant function)

I thought this was very good:

There need not be a distinguishable Logical type. Applications may have a logical type distinct from Number (see Distinct Logical), but logical values may also be represented by the Number type using the values 1 (True) and 0 (False). This means that functions that take number sequences (such as SUM) may or may not include true and false values in the sequence. Creators of portable documents shall not create documents that presume that logical values are or are not Numbers.

From s2.3 Test Cases:

To reduce the risk of misunderstanding a requirement, and to increase the likelihood of correct implementation, this specification includes a large number of test cases that are normative (that is, they are part of the specification). In particular, every function and operator has at least one test case, and typically many test cases. An implementation shall pass all of the test cases for a given function or operator (unless otherwise specifically noted in the text) to be able to claim conformance with that function or operator. No set of test cases can be complete, so to claim conformance for a given function or operator, implementations shall meet all the requirements of this specification of it, even if there is no specific test case for some aspect of it.

From s4 Types:

In OpenFormula a value may have one of the following basic types: Text (the string type), Number, Logical (the boolean type), Error, Reference, or Array. The Logical type may be implemented using the Number type, as described below, instead of being a distinct type. An application may choose to not implement complex numbers; if it implements complex numbers, they may be implemented using a distinct complex number type, or it may implement complex numbers using the Text or Number type. An implementation may provide other basic types, and it may have many specialized subtypes of these types beyond those mentioned in this specification. A portable document shall only depend on constructs defined in this specification (including only these types) to conform to it.

And s4.2 Number:

Note that many formula creators are not sophisticated in their understanding of how computers determine their results. Many users, for example, do not understand computer floating point arithmetic models, and have no idea that many implementations use a base other than 10 (or what that would mean). In particular, many implementations use base 2 representations, with the result that value 0.1 can only be represented imprecisely (just as 1/3 can only imprecisely represented in a base 10 decimal representation). This problem applies to all uses of typical computing equipment, including nearly all programming languages, and this specification does not attempt to fully resolve the problem of unsophisticated users.

From s4.2.2 Date and Date Time

Many applications cannot handle Date values before January 1, 1900. Some applications can handle dates for the years 1900 and on, but include a known defect: they incorrectly presume that 1900 was a leap year (1900 was not a leap year). Applications may reproduce the 1900-as-leap-year bug for compatibility purposes, but should not. Portable documents shall not include date calculations that require the incorrect assumption that 1900 was a leap year. Portable documents shall not assume that negative date values are impossible (many implementations use negative dates to represent dates before the epoch). Portable documents should use the epoch date 1899-12-30 to compensate for serial numbers originating from applications that include a 1900-02-29 leap day in their calculations.

A rather strange one in s4.3 Complex Numbers:

Applications may support complex numbers, though this support may require user installation of optional plug-ins.

but
Portable documents shall use complex numbers only in functions that are explicitly noted as accepting or producing complex numbers (as appropriate).

In s5.7 Non-standard Function Names we find

Thus, when writing a document using function(s) not defined in this specification, an application should include a prefix in such function names to identify the original definer of the function's semantics (which at first will be application's developer). ... Applications may implement functions originally defined by another application, and thus may read and/or write function names that use another application's prefix. Examples of such names include COM.MICROSOFT.CUBEMEMBER, ORG.OPENOFFICE.STYLE, ORG.GNUMERIC.RANDRAYLEIGH, and COM.LOTUS.V98.FOO.

and
Applications that do not support a function should compute its result as some Error value other than NA().

In s 5.8 References

Applications may omit support for subtables (and typically do), unless they claim conformance to another part of this specification that requires such support.

In 5.11 Named Expressions

Portable documents should limit the names of their identifiers to only (Unicode) letters, underscores, and digits, not including patterns that look like cell references or the words True or False. Portable documents may use international characters, but note that some of today's applications do not yet handle such identifiers:
PortableIdentifier ::= ( LetterXML
(LetterXML | DigitXML | '_' | CombiningCharXML)* )

So what expectations about interoperability can we draw from these? First, interoperability is not guaranteed: it requires


  • The user or generating system creates a portable document

  • The user avoids relying on any semantics that the application has but which are not defined in Open Formula (including precision issues)

  • The user knows enough computer science to avoid floating point issues

  • The consuming application uses a predefined group that has functionality that matches the document

  • However, when applications do implement a standard functionality, it must be complete against the test cases.

These seem quite modest and realistic caveats, unless you want guaranteed interoperability even for laymen, no matter what they do. The Introduction's assertion that document creators can have confidence that the recalculated formulas in their documents will produce equivalent results if given equivalent inputs seems a little glib: it assumes to portable documents using the small group of about 100 predefined functions, I guess. Or perhaps it means "it will either run correctly or not at all." But it is a definite step forwards, so I think it is worth supporting.


You might also be interested in:

2 Comments

Rick, You are quoting from an working draft. This specification has several stages of editing, review and approval before it goes out for public review. The conformance language, in particular, will be given a brisk scrubbing. The structure of conformance, in terms of targets, classes and levels, is likely to substantially change.

Of course, we would welcome any constructive comments you might have on the ODF TC's comment list, e.g., comments that go beyond the trivial observation that an unpublished, unapproved working draft has the kinds of consistency issues that a unapproved, unpublished working draft typically has.

Great! I look forward to seeing the out-of-synch statement in the Introduction 'briskly scrubbed.'

(By the way, I see that the text about being 'confident' was added in the same days as the 'portable document' idea was added, and after the different conformance levels were added. So this does not look like a case of text being accidentally out of synch, but being intentionally over-promising: perhaps the introduction text states goals rather than providing a summary? The earlier introductions seem more realistic.)

News Topics

Recommended for You

Got a Question?