As the Chinese, Korean, Japanese markets get mature, there is more demand that Western typesetting and word processing systems support CJK graphical idiom.
In this blog, I want to suggest two great influences on CJK typesetting which can be understood as principles or generators of many CJK graphical idioms: the first influence is rather mechanical: that having square ideographs has consequences that tends to generate certain kinds of designs and ways of expressing those designs; the second influence is cultural and graphical: the influence of mystical diagrams associated with Taoism.
We Westerners, who do not have these mechanical/graphical/cultural traditions as part of our experience, can too easily discount them, it seems. We might feel a little different if, for example, we were told that hyphenation and font-size were merely niche requirements that a standard on word processing could safely ignore. Or that they were too complicated for foreigners to understand.
In history, there have been hundreds of thousands of Han ideographs, and ideographs formed on the same principles. There are hundreds primitive characters based on stylized drawings of real objects (see here for tortoise and sometimes these are formed into words rather like English (see here for characters for goldfish.) However, most Han ideographs are formed by some combination of radicals (often distorted primitive characters themselves) which together often indicate the general sound and general meaning area.
However, the characters or radicals by themselves are frequently not enough to allow a lay reader to be instantly confident in either the specific meaning nor the specific pronunciation. (DeFrancis' The Chinese Language comments that for ideographic text and lay readers, we should more think of gleaning meaning rather than reading.)
Often it does not matter. Travelers to Japan will frequently note how often Japanese draw characters on their hands, as an aid to speech: Japanese has many homophones. And similarly, written communication has needed to develop apparatus for specifying the pronunciation of characters: in Japanese this involved the development of annotations (such as the out-of-line ruby annotations or the inline warichu annotations.) The Japanese-influenced Taiwanese developed a similar system, and a phonetic syllabary called bopomofo in particular for education. The Vietnamese adopted an extreme measure: they abandoned Ideographs entirely for a Latin orthography with multiple levels of accent marks.
Another natural influence of having large square characters is that CJK typesetting that the character becomes the unit of measure for layout: a margin may be three characters deep, the initial indent of a text block may be two characters deep, the line may be 20 characters deep. (This is not dissimilar to how Westerners used the em or en units so beloved of Scrabble players—In the West, it allows various page design parameters to be tied together, in particular to be tied to font size, which reduces work and improves consistency: in the case of HTML, it is an good capability for the elderly or poor sighted who can use larger type. )
In the case of CJK typesetting, the shoe is on the other foot: rather than the type size selecting the width, the direction of causality is opposite: a page is designed with a certain number of characters along a line (perhaps including margins and inter-column spaces) and a certain number of lines per page: the font may be sized to fit this number best. The graphical implication should be clear: a Western based typesetting system specified in terms of ruler measurements will tend to have slightly too small characters for the available measure and slightly too much whitespace. When given a line length, it will use increased letter spacing to fit the characters into the line: the user may then have to tweak the lengths to get an acceptable result. For complicated Ideographs, you really want the largest type you can fit in a measure, for the simple mechanical reason of readability: some of these characters are incredibly complicated.
So the effect of the Han Ideograph is that it encourages a grid idiom. The recent draft at the W3C on Japanese Requirements for Layout has really good material on this. The grid is engrained in CJK minds: it is how they are taught and practice writing characters, and on the paper that they have used for handwritten school reports, for government forms, and even private communication: gridded paper is readily available in every stationary store. The grid aspects are even stronger for Chinese than Japanese, since the Japanese frequently draw their non-ideographic kana syllables as half-width hankaku characters.
Now this is not to say that there is no kerning within the grid. Nor that measurement-based or ragged-edge typesetting or proportional kana typesetting are unknown and unacceptable in any context. Consider the poor level of typesetting that is perfectly tolerable for draft printing, and ephemeral, casual and commercial documents everywhere in the world. These things matter when you want your text to impress and be well-readable, when you want the users to be happy and feel the software is working with their expectations rather than against them, and where there is a need for the text or the application to be culturally sensitive. However, it is not just an issue of respect, but also functionality and fitness for purpose.
A book on Japanese garden design I read once claimed that the Japanese aesthetic involved the interplay and a duality between organic shapes and man-made linear shapes. (There is perhaps some of this in Chinese landscape painting: the small sharp horizontal lines of a hut or temple contrasted with the grotesque mountains and mists.) Anyone going to a Japanese bar will notice how common it is for a wall to be covered by shelving creating small boxes: the grid is deeply ingrained in the Japanese aesthetic, from what I have seen. It may start with a Confucian-influenced education system that finds grids congenial, but it does not end there.
I am always interested that material on CJK typesetting frequently mentions hanging punctuation as important and idiomatic. Hanging punctuation is where small punctuation or quotation marks get placed outside the grid rather than starting a new line with them: this is often presented as a method to avoid ugliness, but I want to point out that, more importantly, it preserves the grid of ideographs: breaking out of the box is allowed as a decoration or annotation. However, I must note that in fact I cannot recall ever actually seeing hanging punctuation actually used, certainly not in body text. Full justification is used much more. I suggest that the common references to hanging punctuation in CJK typesetting material should be seen as reflecting a deep design principle (rather than just wishful thinking): maintain the grid.
Another outcome of the use of ideographs is that the points for breaking lines can be very straightforward: between any ideographs is usually acceptable. The complications in line breaking largely arise because of the introduction of punctuation characters, and other alphabets, numerals or syllabaries. The Japanese standard kinsoku rules provide a hierarchy for breaking, and this system has been localized for Chinese and Korean.
(Aside: however, it is not true that blind line-breaking between ideographs is always appropriate. I remember being told of the case of East Kyoto ( 東京東) and Tokyo East (東京東) [if I remember, pronounced higashi kyoto and tokyo to]: if these appear inline the reader will be aware they have to figure out the ambiguity, however if they are broken at the end of a line, the reader may get the wrong impression. Cases like these are perhaps better solved by rewriting rather than typography, however it would not surprise me if a careful writer, writing to an assumed grid, would choose characters to prevent this kind of ambiguity.)
The final influence of Ideographs I want to mention here is that because words may be formed with two to four characters only, headings in tables may have a lot of wasted whitespace, and nature abhors a vacuum. What fills this vacuum is the subject of the next section.
Mystical and Almanac Diagrams
In the West, we do not use diagrams as religious or spiritual objects. In the Far East, especially associated with Taoism, diagrams are not only explanatory apparatus, they can also be used for divination and even have their own power to influence fate.
I think underlying this is the kind of macrocosm and microcosm belief that underlies much Chinese thought: that there are repeated patterns in the large and small which relate and can influence each other: a system of resonances between different planes of being. (When a taiphoon hit Taiwan, a friend joked 'Oh, the President must be having an affair!')
The simplest of these diagrams is well known: the yin and yang symbol; and they range to very complex and esoteric talismans made with fantastic versions of Chinese radicals. But the diagrams I want to mention here are the technical kind of mystical/pre-scientific diagram: those used for geomancy, fortune telling, almanacs and so on.
You could start off by first looking at this almanac diagram called Liunian Dali (under the delightful heading Twelve Dragons manage water).
The meaning is immaterial here: what I want to point out is that 1) this is structured tabular data, and 2) it does not use a grid tabular layout, but a layout entirely unfamiliar in the West: our culture does not have any cyclical diagrams of compass or season like these (so different from the clockface or piechart), and consequently their graphical vocabulary of trapeziums, kites and triangles is not part of our Western graphical tradition for structured information, and certainly not in the past few decades. (I would not be surprised if something similar could be unearthed, for example in extreme publishing such as railway timetables or in Theosophist works, but as exceptions that perhaps prove the rule.)
I don't know that there is much consciousness of this difference even in CJK experts themselves: as an outsider looking in (and paid to do so professionally, for a few years) it was a stark difference. (I am happy if someone ascribes this difference to other cultural causes than the Taoist or almanac diagrams I mention here: however, what I think it unquestionable is that the influence does exist in the CJK culture that is not present in the West.)
(We may expect that the Confusionist tendency might encourage strict grids, while the Toaist tendency might be towards a certain wildness if not anarchy of design, I think the Taoist tendency also contributes a respect for getting diagrams right that will encourage layout grids where they are appropriate.)
For examples, here are some links to some Taoist diagrams: here, here, here, here, and here. In all case, though it may not be apparent, the diagrams are what we might recognize as a binary tree, arranged in a circle. Some of these use the traditional hexagram graphic of six broken lines (where broken and broken are the binary values.)
The graphical root is presumably the circle used to diagram zodiacs, seasons and compass directions. But what I want to point out is first that our Western word processors have no equivalent for this: the first diagram is a circular table, the second chart is a snowflake of tables with shared internal triangular headings,
I am not suggesting that OpenOffice and Word need to support ancient, arcane and esoteric Taoist mystical diagrams! However, I think it more thanplausible that some graphical principles from these diagrams (which are still in common use) are still idiomatic in CJK publishing: they are part of the CJK "design grammar" that generates CJK layouts. Looking at texts created before the 1990s, in the libraries at Academia Sinica Taiwan, I was struck at the creativitiy and liveliness of the tables commonly found. Word has the slightest nod to this in its diagonal spit feature for table headers, and there have been requests for ODF to get moving in this area. But I think there has been a quite miserly response by Westerners, who want to provide the least functionality that might possible work; which will of course probably not meet user requirements.
For some examples, see my blog item What are Chinese Tables?. I developed a more thorough theory of CJK tables in 1999, the Community of Cells, and there is an example DTD here. There is some more information and links in Standardization as a Collective Loss of Imagination.
What we can see in those kinds of tables, I suggest, which I believe most CJK readers would instantly be familiar with even as most Western readers may find them repellent, is that they spring out of some of the influences I have identified here: the square nature of ideographs and small words means they can be fitted into smaller spaces, the desire to avoid wasted whitespace means that the headings can be fitted into places we might be surprised at, the Taoist tradition allows a lot of graphical flexibility in the way that tables are laid out.
Now of course in the West we do have some unusual tables: the Periodic Table is perhaps the irregular but repeating structure that springs to mind. Already, the commercial vendors have started to explore the territory between graphics and tables; the SmartArt for example. And from the direction of graphics, we can see diagram languages such as UML are incorporating list and table structures as graphical elements (e.g. classes in class diagrams.)
Can standards lead?
Now that governments and competition authorities are expecting that international standardization efforts will, without stifling innovation and consolidation from other sources, act as channel for Requests for Enhancements in collaboration with the vendors at the table (i.e., what governments actually mean by 'openness' regardless of what vendors wish it meant), it is hardly surprising that the standards bodies (such as ISO/IEC JTC1 SC34 and its feeder-colleagues such as OASIS, W3C and ECMA) are keen to redress critical and neglected areas.
This poses a challenge to the standards systems, since in these areas there is a strong disinclination by vendors to let standards lead, as far as feature selection goes, rather than leaving it in the hands of vendors. However, vendors have clearly failed to show necessary leadership in the area of CJK typesetting: it should be of no surprise to them if users and CJK institutions become more assertive in demanding comprehensive improved support for CJK typesetting and graphical idioms. I have watched this issue quite closely over the last 20 years, particularly in my time in Japan and Taiwan.
The vendors should not pretend they have moved at anything other than a snail's pace: I suppose they hope that if they do nothing, then maybe the problem will go away. Since that would involve Han ideographs going away, it seems a lazy,dumb and futile strategy. Vendors should not delude themselves: one of the reasons governments and competition authorities want openness is to compensate for the deafness of vendors to government requirements.
(This is no less true for ODF than OOXML, by the way: look at the positive contribution South Africa's Bob Jolliffe and Brazil's Jomar Silva made a few years ago on the ODF TC, in pushing for better Digital Signatures in ODF. This was not a feature that the vendors were particularly interested in: standards committees need a balance of interests between vendors/non-vendors, between public/private sector, between individuals and corporations. They need each other.)
Governments and competition authorities need to continue the pressure on vendors to participate in the standards bodies: even if just to see them as a really great source of pan-government (or pan-national) requirements collection. Microsoft had a bad history of implementing the first generation of a standard and then ignoring subsequent versions and updates, for W3C standards at least: the recent thaw with SVG and HTML5 may be a welcome step forward. This can be a particularly insidious form of market domination: if your competitors track the standard, they lose interoperability with a market leader which limits itself to initial versions only.
Many governments have already shown themselves to be a little unsophisticated (IMHO) in how they require open standards without availing themselves of the basic apparatus of validation, conformance and version tracking: I have mentioned before, for example, the folly of governments who adopt OOXML Transitional when it was explicitly developed as the class of OOXML documents that should not be so adopted: OOXML Strict was developed as the version to be adopted. Governments will get exactly the opposite result from the one intended by allowing OOXML Transitional.