Greener typesetting

By Rick Jelliffe
May 2, 2009 | Comments: 6

Word processors and typesetting systems get their printing characteristics from the typesetting algorithms they use. For example

  • Microsoft Word expands expands spaces between words in order to fully justify a line (i.e. make the start and the end of the line flush with the margin.)
  • WordPerfect can squeeze spaces to allow the same thing (at least, according to Microsoft)
  • TeX both squeezes and expands spaces, to get an optimal result.

And for deciding whether to hyphenate a long line


  • Word only looks at hypenation on previous lines to decide whether to hypenate

  • TeX look at the whole paragraph to see what the optimal hypenation should be.

I gather that OpenOffice has the same default behaviour as Word: any readers with better knowledge please feel free to comment.

So we can see that Microsoft Word's algorithms will err on the side of adding spaces; there are other algorithms in Word, such as the paragraph and page breaking settings, that act similarly. We can say that Word's designers were very averse to crowded pages.

TeX's designer, Donald Knuth, on the other hand, was averse to non-optimality. This is why TeX—or typesetting engines forked from it—has been the typesetting system of choice for many years for the quality typesetting market.

So which is better? Crowded pages? empty pages? Optimal pages? For draft, empty pages. For professional documents intended for reading, optimal. For reference technical material, probably crowded. That is probably as far as I would have gone, until recently.

Lets say documents have 50 lines per page. And 25 of these are short lines, headings or blank lines. And the different algorithms used by different products will result in an extra line being sent to the next page in about 10% of those documents. And these will result in an extra page being printed in about 2% of cases. And humans will clean up half of those. That leaves us with 1% of documents having an extra page printed using systems that err on the side of adding space. It does not need to be scientific number.

But if we consider there may be one hundred million word processing documents printed every day (anyone know the real number?) That would mean a million extra pages per day. It would be a fun college project to get a better estimate.

Now, paper is usually made from estate timber, so there probably is no SAVE THE TREES deforestation angle. But paper production takes energy, toxic bleaches are used, power is used to make it, fuel is used to transport it, if it is disposed by burning the carbon gets released, and more toner cartridges are used. A tiny effect for individuals, but a decent effect when aggregated.

Greener stylesheets and templates?

So perhaps it would be better for our typesetting software including word processors to default to tighter typesetting.

For example, governmental and corporate stylesheet and deployments may care to check the Word options like


  • Do full justification like WordPerfect 6.x for Windows.

  • Don't expand character spaces on the line ending Shift-Return.<./li>
  • Don't add extra space for raised/lowered characters.

  • Suppress "Space Before" after a hard page or column break

  • Automatically hyphenate document

Actually, most of these options come from a page at IPBA on setting up Word to get less horrible typesetting that I recommend.

Many of the XSLT stylesheets and templates made for transforming XML into a word processing or typesetter's native format pay a good deal of attention to making sure that each paragraph style has good properties for breaking, keep-with-nexts and keep-togethers: autoformatting is often used for professional and bulk publishing where unnecessary whitespace is a $ expense or where there are fixed format restrictions.

All the different applications have different algorithms. As systems increasingly adopt standards, and as the pressure for standards to converge, decisions on which algorithm to use will increasingly need to be made. But at the moment there is no compelling technical superiority of the competing algorithms: indeed, it becomes a matter of taste where some people prefer whitespace and others prefer concentrated text. You say tomato and I say tom-
-ato

An environmental argument that disfavours profligate page generation can provide a new angle for discussions in standards bodies about socially-required features and convergence targets.

In the case of OOXML (and I suspect this applies to ODF in extent but certainly in kind), I think the standards processes (at SC34 WG4/WG5 and liase with OASIS?) should review IS 29500 and IS 26300 from this angle and make the best features available. And encourage the vendors to make the environmentally-friendly features the defaults.

e.g.

In concrete terms, it comes down to issues like this. Widow and orphan control relates to paragraphs that have been broken so that only a single line appears at the top (widow) or bottom (orphan) of the page. The simplest way to implement widow and orphan control is to break the paragraph a line earlier than the detected widow or orphan. So we add some whitespace at the bottom of the earlier page and move a line to the next page.

But a much better system, both for optimal typesetting and according to this green angle, would be to bring the extra line forward from the previous page/column, if the text frame or printing area on the previous page was large enough to allow marginally tighter line spacing, or if there was some discretionary area still white at the bottom of the column.

WordPerfect implements a system where it will even adjust the margin sizes to allow better fit. They even were granted a patent for it. Another sign of the toxic nature of software patents, this probably has prevented others from adopting it; and, since people tend to believe their own marketing, you can expect vendors who haven't historically provided this kind of feature to downplay its usefulness: they are making a virtue out of necessity of course.

While I am not suggesting that everyone should adopt the WordPerfect auto-fit system (if it's patent has elapsed) or that a standard should require it. The point is that some vendors who are used to saying that "we don't provide feature X therefore users don't require feature X therefore feature X is not useful therefore we should not support feature X in the future" (which in its extreme ODF-partisan form morphs to something like "therefore no software should support it since it will prevent interoperability.")

But, just as internationalization and accessibility provide spotlights for objectively re-evaluating capabilities of technologies and the clauses in their standards, we would do well to also look at environmental audits of our standards for page-producing technologies.


You might also be interested in:

6 Comments

You can do the math yourself, but my calculations suggest that the carbon footprint from participants traveling to international meetings to produce such a standard would exceed, by ten-fold, the annual savings from reducing the print output by 1%.

So how about a simpler solution: print double-sided, or reuse paper that has already been printed on only one side, or use less ink by printing in draft mode, or use printers that draw little power when idle. Or even better don't print at all. Any of those solutions will save far more than 1%.

Rob: Indeed. (In fact, I have in recent years limited my own travel arrangements. I spent a year without any travel, as a penance against a lot of travel the year before. WG4 is starting to work by teleconference, and WG1 uses mainly electronic.)

Double-sided? Different people do different things: printing double-sided is a user's action and I don't see how standards could impact it; changing the defaults or characteristics of the typesetting engine is a developer thing, and definitely standards have their place there.

It is not good enough to say "We don't have to do anything because they are worse." Everyone needs to do their bit. I don't think I have ever heard any environment-related issues raised concerning markup-related standards, but that does not mean it should not be factored into deliberations and agendas.

[[Mind you, I do tend to think that the style parameters that word processors use are pitched at too low a level. I think users would be better with settings like "spaciousness" or "density" or "formality" that parameterized styles (i.e. themes.)]]

tried ecofont yet ?

A collection of good ideas in this article. By the way, any info on the justification algorythm used by PrinceXML?

Anonymous: Ecofont looks interesting. I wonder whether it works well for screen viewing as well as for print. And for print I would expect it would operate better if tuned to the print engine's characteristics. And even then, I wonder whether the same savings and effect could be gained merely by selecting a dark grey color for the text. Or by using font effects so the font has a black outline and grey filling.

I am glad to see the idea however. It seems the kind of thing we should be embracing.

Istvan: PinceXML appears to use its own engine for typesetting. I have no particular information on it, though the break at top of p24 in the HG Wells Prince sample suggests that they only move forward (in that particular design, there is not much discretionary whitespace between the bottom of the text frame and the footer, but there is clearly no reverse feathering to attempt to fit into the previous page.)

PrinceXML seems to use CSS styles, and the CSS2 recommendation has tips in 13.3.5 like

# Break as few times as possible.
# Make all pages that don't end with a forced break appear to have about the same height.

The most obvious thing about the PrinceXML sample is there is no hyphenation. I think that suggests that a very straightforward algorithm is being used.

Istvan, Rick: Prince does support hyphenation, but you have to enable it with the 'hyphens' property; similarly to browsers it's not enabled by default. The next major release of Prince will use a Knuth-style justification algorithm that attempts to find optimal breaks on a per-paragraph basis.

News Topics

Recommended for You

Got a Question?