I finally downloaded Office 2007 SP2, the upgrade to Office 2007 to give first class ODF support, and decided to try a simple experiment. When I load the ODF 1.1 standard (the .ODT version from Ecma), what does it look like?
I don't want to give oxygen to fires lit by agitating spin doctors, but it is appropriate to check SP2's implementation. So I'll preface this by imagining four classes of fidelity that a document from one word processor might have in another: not all errors are equally important:
- Draft. All significant text is present, in the right order, with no omissions or additions. Autonumbering and internal cross-references should work.
- Rich text. All text is present, with rich text and graphics and tables and headings. Like HTML. Relative style relationships should roughly hold. (Old-timers can think of galley proofs.)
- Publishing quality. Styles should completely follow the stylesheet. Page-formatting features and auto-generated text should work. Arcane typesetting features should work. The page count should be within +/- 10% of the original. Features for which there is no direct equivalent should be simulated as best as possible.
- Facsimile. The document opens with, to all intents and purposes, the same formatting with the line and page breaking and page count.
When looking at the ODF loading, it looked like the Draft and Rich Text levels of fidelity had been met. I could not find any examples of missing text, the numbering appeared correct, the basic rich text seemed right, headings were appropriately marked, tables and grey backgrounds looked OK, and so on.
There is of course no chance of a Facsimile level of fidelity, but I do not think it unreasonable to expect a Publishing quality level. SP2 does not quite deliver it.
Just the briefest scan showed two errors of this kind (I commend this as a good test case if anyone else cares to follow up with other issues):
- Inherited page breaks wrong
- No line numbering on quoted schemas
I decided to trace through the page break issue, since this is one that would definitely come up in conversion jobs.
I first opened ODF1.1 in OpenOffice 2.4 and 3.1. Both opened with 720 pages.
I open the ODF 1.1 in Office 2007 with SP2. It registered 1,982 pages. WTF?
Looking at it, it was clear that every heading was being preceded by a page break.
So time to look inside the ODF file.
Looking at the first occurrences, I decide to pick the page break at the 1.1 Notation heading.
Tracing through the styles in Office, it is a Heading 2, which is based on Heading 1. And Heading 1 has the property Page break before set.
Now I look at the ODT file. I open it up in ZIP, and see that the ODF is:
format makes use of a package concept. These packages are described in chapter
<text:reference-ref text:reference-format="chapter" text:ref-name="Package">17</text:reference-ref>
<text:list text:style-name="Outline" text:continue-numbering="true">
<text:h text:style-name="P5" text:outline-level="2">Notation
First of all, I quickly rule out that the break is caused by the preceding paragraph: the style is just a vanilla paragraph.
We find style P5 defined in in the same
<style:style style:name="P5" style:family="paragraph" style:parent-style-name="Heading_20_2"> <style:text-properties fo:language="en" fo:country="US" /> </style:style>
We look in
styles.xml for those styles.
<style:style style:name="Heading_20_1" style:display-name="Heading 1" style:family="paragraph" style:parent-style-name="Standard" style:next-style-name="Standard" style:list-style-name="Outline" style:class="text" style:master-page-name="" style:default-outline-level="1"> <style:paragraph-properties fo:margin-left="0cm" fo:margin-right="0cm" fo:margin-top="0.847cm" fo:margin-bottom="0.212cm" fo:text-indent="0cm" style:auto-text-indent="false" style:page-number="auto" fo:break-before="page" fo:padding-left="0cm" fo:padding-right="0cm" fo:padding-top="0.212cm" fo:padding-bottom="0cm" fo:border-left="none" fo:border-right="none" fo:border-top="0.002cm solid #000000" fo:border-bottom="none" fo:keep-with-next="always" /> <style:text-properties fo:color="#000099" fo:font-size="18pt" fo:font-weight="bold" style:letter-kerning="true" style:font-size-asian="18pt" style:font-weight-asian="bold" style:font-name-complex="Arial" style:font-size-complex="16pt" style:font-weight-complex="bold" /> </style:style>
<style:tab-stop style:position="0.635cm" />
So the issue is that the grandparent style
fo:break-before="page" and this is overridden by the parent style
"Heading_20_2 which sets
Which is right?
So what does
fo:break-before="auto" mean. In the ODF 1.1 spec:
15.5.22 Break Before and Break After
Use the fo:break-before and fo:break-after properties to insert a page or column break before or after a paragraph. See §7.19.1 and §7.19.2 of [XSL] for details. The values odd-page and even-page are not supported.
Ok, so lets go look at XSL. The reference is
[XSL]W3C, Extensible Stylesheet Language (XSL),
http://www.w3.org/TR/2001/REC-xsl-20011015/, W3C, 2001.
The XSL spec is s7.19.2
Value: auto | column | page | even-page | odd-page | inherit
Applies to: block-level formatting objects, fo:list-item, and fo:table-row.
Values have the following meanings:
No break shall be forced.
Page breaks may occur as determined by the formatter's
processing as affected by the "widow", "orphan", "keep-with-next",
"keep-with-previous", and "keep-together" properties
That seems rather clear. There should not be a page break.
So the next step is to look at Microsoft's Implementer's Notes. These were something that I really welcomed, and I think they show a sign of Microsoft's increasing maturity: decades ago I was really impressed that engineering-cultured companies like Hewlette-Packard actually printed books of the bugs in their current UNIX offerings. It should be really helpful.
Navigating through the notes, we see that the note on s15.5.22 says
The standard defines the property "auto", contained within the attribute fo:break-before, contained within the element <style:paragraph-properties>. This property is supported in core Word 2007.
So according to the Implementer Notes, auto should be supported. But what does Word thinks "auto" means? Lets look at the OOXML standard to see the equivalent. OOXML does not have a single equivalent, it just has the
184.108.40.206 pageBreakBefore empty element.
What seems to have happened is that the implementer has assumed that "auto" meant "inherit" when it in fact is resets page breaking to its normal behaviour. It looks like a bug to me.
How this mistake could have occurred? It suggests that there is a deadline issue at Microsoft that is running directly counter to their needs for quality in delivery of standards.
It would be highly ironic if the Implementers Notes system actually has been their undoing. Normally it would be beyond credibility that no-one would have opened up the ODF 1.1 specification when implementing ODF 1.1 and therefore noticed the problem. But I wonder whether they sliced it into pieces, as HTML or whatever, as part of their implementation tracking system, and always referred to that? Speculation, but stranger things have happened. ODF 1.1 needs to be part of their regression tests.
(I didn't trace through the reason for the lack of line numbers on schema fragments. The implementers notes for ODF mention that Office supports this feature, so it looks like a bug or incompleteness.)
How to fix
Actually the fix is trivial.
In the Home tab of the Ribbon, click on the little box at the bottom of the Styles chunk. This will open the styles list on the left side of the document.
Click your mouse in the offending heading at
1.1 Notation to move the cursor there. The Heading 2 style will be highlighted in the Styles list.
Right-click on Heading 2 and select modify. A box will come up to say what the style is. In the Format button at the bottom, select Paragraph, then the Line and Page Breaks tab. Deselect Page Break Before and save your way out.
This will not only make Heading 2 correct, but fix all the other headings derived from it.
The page count now? A credible 681, only 39 different from the Open Office.
ODF as a Get-Out-Of-Jail-Free Card
While it takes a few steps, it looks like the standard is clear here. Obviously the SP2 behaviour is different from OpenOffice, and doesn't follow the ODF standard for what the markup says.
And now comes the ODF killer. Just when I thought everything was simple, the ODF 1.1 standard's shoddy (in patches) drafting and poor review kicks in. I left out the second paragraph of ODF 1.1 s 15.5.22 on
break-after. It says:
These two properties are mutually exclusive. If they are used simultaneously, the result is undefined.
Now, I bet that this was supposed to mean that if the previous paragraph had a break-after and the current one has a break-before, then it is application-defined what happens. (This alone is enough to make pagination problems enough to fail my Publishing quality criteria above, even if we have conforming implementations. But it does reflect the reality that different systems have different resolution mechanisms that are sometimes difficult to override.) But that is not what it says.
And, sure enough, when we look again at style
Heading_20_2 it does indeed have settings for both break-before and break after. This is a get-out-of-gaol-free (jail) card for implementers, in this case Microsoft, but it can be someone else next time.
Standards are difficult. They require review and maintenance, not the blind pressing ahead with new features. A new major implementation of the standard often reveals unsatisfactory parts of the standard. I expect ODF will be improved as more problems or surprises are revealed in Microsoft's implementation and traced to their causes.
But Microsoft should fix
(I welcome corrections and other technical interpretations of what has gone on here w.r.t. interpretation of ODF 1.1, especially ones that are even vaguely plausible. Is there something I have overlooked?)