This is a follow up to my two recent bloggings, on the i4i patent.
I first encountered SGML in around 1989, when I worked for a great company called Uniscope in Tokyo. I helped John Reekie (later author of A Software Architecture Primer) write an streaming SGML text processing system; we wrote a full but small LISP, and had a special form for rules triggered by element events.
Now back then, and well into the 1990s, we did everything streaming for SGML. The reason was not because we couldn't do in-memory processing if we wanted, but the plain fact was that SGML documents were usually many times larger than physical RAM. So you learned all sorts of streaming techniques: pipelines, multiple passes, feature extraction, divide-and-conquer, and convert-document-to-code (as used still in my Schematron implementation.)
To cope with editing, SGML had a system for splitting documents into parts: the external entity system. Assembling the parts into a whole would be part of parsing the document. There were some interactive editors for SGML however: Softquad's Author/Editor for example. But we tended to edit with UNIX's vi, which was designed for efficient line-oriented markup: the trick being to layout the SGML text in ways that corresponded to vi's line/word/block oriented commands.
Back then, if you had asked us how to do a in-memory parser, it would have been no trouble, We would, perhaps have done what Java still does for its structured documents and rich text: use a backing store derived from Emacs in the 1980s: a large character store, with markers (later overlays) as data structures on top of it.
By the early 1990s, the need for efficient storage of text nodes was particularly keen. (Indeed, MicroSoft's 1994 SGML Author for Word was unusable for this reason: it conked out after about 20 pages.) The option of using a backing store for text would have been an obvious solution.
In particular, it would be obvious to anyone who had Ted Nelson's book "Literary Machines". I have the 1992 version (93.1) in front of me: he came over to Allette Systems in Sydney to speak. (As he spoke to people afterwards, he took his book and suggested they buy it!) In that book he gives some of the more concrete details of his fabulous hypertext system Xanadu, as the design stood then: all text would have be stored without markup in a memory store called the native document. A layer of inclusions or transclusions or spans on top of this could be available. (p4/42) Then there would be various other data structures on top of this, which could be used to control processing and so on: in Nelson's terms these are all kinds of links and there are special layout and typography links. (p4/52)
Now let us jump forward a couple of years, to 1997. Ted Nelson wrote an interesting article for XML.COM Embedded Markup Considered Harmful.
Here is a quote:
The best alternative is parallel markup. I believe that sequential formatted objects are best represented by a format in which the text and the markup are treated as separate parallel members, presumably (but not necessarily) in different files
Now when Ted Nelson is making this comment, it is important to realize that he is not endorsing the i4i patent! He is just stating the same thing he had been saying in 1993 and before: his architecture that text should be kept separate from links. It is notable that several of Nelson's point in this article mirror those of the 1998 patent (which was applied for in 1994). Now I am not claiming plagiarism by Nelson or the patenters. (However, I do think it is difficult to credit that there could be any expert, particularly any North American expert, working in the field of hypertext and SGML without some awareness of Nelson's work.)
But reading the courts finding, you get the impression that there had been no work in this area.