Every document system for printing provides mechanism for specifying page sizes, margins, page templates and so on. Any graphics based system also has a coordinates system, which specify the origin and units used for points, lines and frames relative to the page.
This blog entry looks at the page models and geometries of two current XML-in-ZIP publishing formats for text documents: ODF and IDML. The ODF style should be pretty familiar, but the IDML style might be a little surprising.
When thinking about page models, there are four main design choices, relating to whether the major text areas need to be explicitly declared (or is there an idea of a default text area in the margins of the page) and whether text blocks can be connected automatically to each (so that overflow text goes into another box, and therefore an overflow can even trigger the generation of a new page.)
|No connect||Simple slides||Text boxes|
Nowadays many systems support multiple of these: HTML's started off as a defualt model, the added text boxes as CSS allowed <div> to be positioned on the window, for example.
The provision of explicit text flows has a lot of impacts on other aspects of the design: if you are thinking in terms of text flows, then a two-column layout will be two flows, connected. If you think in terms of a default text area, then you would add some property to the default page to say "use a second column". This design then has flow-on effects: if you have a default flow model, then column balancing would be a natural thing to provide, because in effect the page becomes the container; if you have a text-flow model where the columns are in effect independent, it is a bit diificult to know where the columns go.
There is tremendous variety in this, and Adobe has wrestled with it more than most over the years, because its page-oriented layout market and its FrameMaker technical documentation market. FrameMaker, for example, allows text flows to be divided up to a certain extent, by the provision of a sidebar column, where paragraphs anchored to paragraphs in a text flow will be typeset in a margin area outside the text flow.
The Word Processor world started off without any kind of model of text flows or frames. (Indeed, the name FrameMaker shows that this was the essential technical feature being promoted in 1986, as distinct from other systems.) In a default system, you specify margins from the outside of the page, and the text flows in that area.
The page system that the document specifies may be independent to an extent from the page settings that are provided by the printer driver. Most non-American computer users will be familiar with having to print documents set using "letterhead" paper on printers which only have A4 pages.
The Open Document Format is used for various kinds of office documents. It is an ISO standard, IS26300 and is developed at the OASIS consortium. An updated version is due early next year.
style:page-layout-format-properties specifies a page height, width, orientation, margins, whether numbering is used, number of columns, footnote separator, borders, background image and some miscellaneous properties. (This element has some special case elements for spreadsheets as well.)
ODF supports two flow models.
The first is presumably for slides rather than literature. It use the
text:page-sequence element containing first a list of
text:page elements that invoke the master page (template) to use. Text is in text boxes on each page, rather than being connected between pages.
The second is more conventional. The document has something called the "normal text flow" which is defined for each page by the master page and its
The page can also have
text:frame elements and
Text breaks can be forced with elements such as
text:soft-page-break. But the
text:section element provides a structured way to group a sequence of paragraph-level objects.
ODF does not use a special geometry for pages. The user-visible traditional measures like in, cm, pt (point), pc (percent) are used in the file format. I could not find where ODF sets the origin point for absolute positioning on pages: I suppose it is the top, left corner is 0,0.
Pixel units (px) can be used in a few of places only, when connecting to SVG or graphics. Actually, there is something strange in the ODF spec: support for px units is optional apart from the special cases. How that supports guaranteed interoperability I don't know.
However, ODF does adapt SVG for drawings. In some cases it uses units of 1/100mm. SVG's origin is the top, left corner. Its initial coordinate system for paths is in terms of pixels.
IDML is Adobe's second generation markup language for its InDesign product.
IDML's basic unit is a spread. A spread contains faces: around a central binding location. Usually there would be two faces: a left page and a right page. But there can be multiple left pages and multiple right pages. This copes with two special cases: the first is books where there is a fold-out page; the second is pamphlets, where a spead is folded to create different pages.
Adobe seems to have bought into this idea that it is the spread that should be the central unit of publishing, and surfaced to users more: Acrobat reader now features a spread view, which is why the initial page of books in PDF often are rendered with a grey left-hand-side and the page on the right. This is quite a break from previous page-oriented approaches, and has a lot to commend it.
In IDML, each of these pieces of data content is called a story. The spreads for a document are kept in a spreads/ folder in the ZIP archive, one XML file each spread, and each story (which includes formatting information) is kept in a stories/ folder, one XML file each story. The stories are flowed into the various text frames on the spreads. As this indicates, IDML has been designed to cope with magazine and newspaper-style layouts, which bring in essentially unrelated material from multiple sources.
Interestingly, IDML also has a special folder for stories that have not been assigned to spreads, for documents-in-progress.
Update: For Flash's XFL, here is some information from an Adobe document about exporting XFL from InDesign:
XFL is packaged as a single file that is ZIP compressed and contains an XML-based manifest file and a LIBRARY folder that contains XML-based representation of each page (or spread, if Spreads are enabled in the Export XFL dialog box) and all images produced as part of the export.
When exporting to XFL, each page (or spread, if Spreads are enabled in the Export XFL dialog box) in the source document are mapped to a keyframe in Flash. Also, the contents
IDML works in terms of spread coordinates: the centre point of the binding location. So to the left and below the centry point, coordinates are -x, -y. The only unit that Adobe supports is its version of the point, defined as 72 units per inch. It seems that fractional points are allowed.
Coming from Adobe, IDML allows you to transform coordinates with a matrix operation, so that you can work in page coordinates too. You would use this, for example to allow you to work relative to a TextFrame with a more convenient origin.
Readers who found this useful might be interested also in Castoff hints? Rethinking interoperability and fidelity. I see from PsychoCat's blog that there is a good site for IDML Indesign Secrets.com.
Update: Imaging models
It has struck me that often people coming to XML from the HTML or database side rather than the publishing or graphics side have not had much exposure to some of the issues of page models, geometries, units and so on. Calculation issues are obviously important for graphics, but publishing and printing are not immune: bleed through problems for example. I was reading some other Adobe material today, and I came across a really nice description of the difference in imaging model between their products, and I am including them here just to give readers more exposure to that world and its concerns:
Flash's imaging model is superficially similar to the Adobe Imaging Model (AIM): Both are vector-based and support the notion of paths that are filled/stroked with various kinds of paint. Both support affine transformations (transformations which, like scaling, translation, and rotation, preserve the parallel-ness of an objects lines). Both support raster images, including alpha channel. Both support vector text. Both support simple opacity and the notion of blend modes that dictate how content is composited with content it overlaps. The takeaway is that experience with AIM representation of things will largely transfer to Flash.
However, the two imaging models differ in a number of important ways. The complete list of differences is too large to include here, but some of the more important differences are:
- Flash uses device RGB color only. No other color spaces are supported and all color is uncalibrated.
- Flash uses quadratic curves, whereas AIM uses cubics. This requires that an authoring tool approximate its native cubic curves with quadratic curves.
- Flash paths implicitly use a non-zero winding rule. AIM paths may use either non-zero or even-odd winding.
- Flash supports a limited set of paint types: solid color, gradients, and raster paint are supported; more complex paints, such as general smooth shades, and patterns, are not.
- Flash gradients allow a smaller number of stops than their AIM equivalents.
- Flash supports limited clipping and masking. Flash allows objects in its display list to be clipped and/or masked by other objects in the display list, but this is a simple one-to-one relationship and limited compared to the graphic state based clipping and masking supported by AIM.
- Flash does not natively support dashed or dotted lines. These simulated in Flash by drawing each dash as an individual line segment.
- Flash supports a limited subset of the blending modes available in AIM.
- Flash does not support transparency groups.