Microsoft and the two XML patents #4

HyTime's dataloc

By Rick Jelliffe
August 14, 2009

This is the fourth in a series: see 1, 2, 3

One of the most amazing standards is HyTime, the Hypermedia/Time-based Structuring Language ISO/IEC 10744:1992.

In the days before the XSD standard, it was regarded as a difficult and long standard. By modern standards is it small, terse and easy. It came out in 1992 first, guided by Steve Newcombe, and a second expanded and improved (but even more complex?) edition in 1997 guided by Elliot Kimber. By that stage, however, the simplicity of HTML's links had taken its thunder for hypertext, and XML's new emphasis on doing things with URLs and namespaces and without DTDs had made the DTD-heavy and entity-dependent approach of HyTime unfashionable.

HyTime took a very generalized view of linking: not only should you have links about space (locations of documents and things in documents), you should have links about time: scheduling. HyTime could be used to position satellites and conduct music. HyTime defines a general and concrete system for addressing things in a finite coordinate space with multiple axes. Any kind of object can be identified and addressed, both as a whole and to specific items, points and ranges within it. Multiple linked documents can be bundled up in an archive format SDIF (ISO 9069, an ASN.1 ISO 8824 format.) These can form an bounded object set, which can be constructed from a root document by following external entity links (both external entities that are parsed and those that are not.)

You would tell a HyTime document by its HyTime header

<HyTime VERSION "ISO/IEC 10744:1992" HYQCNT=32 >

HyTime had various specific modules: a measurement module, a location address module, a hyperlinks module, a scheduling module, and a rendition module.

HyTime was tremendously influential, in large part because of its failure. It was too much, too mind-blowing, and too meta. People were unable to grok HyTime. In retrospect, it was a great example of the Right Thing to use Gabriel's phrase Worse is Better, to the perplexment of all involved.

The XML effort at W3C was an opportunity to take these heavy-weight general technologies from ISO and re-work them into digestible chunks. XPath, XPointer and XLink directly owe many concepts from HyTime. Time based multimedia such as SMIL work in the same area. But none were as general as HyTime. HyTime allow links that match using a regular expression syntax (HyLex): a feature that only made it back to the XML world in XPath2. It defined a query language operating over hyperlinked documents HyQ, which corresponds largely to XQuery (forall, ordered, compare, exists, and so on): it even had an assert element.

HyTime was particularly sensitive to providing various kinds of references. Great attention was paid to providing different kinds of location addresses:

Relative location address
is rather like XPath axes: anc (ancestor), esib (elder slibling), ysib (younger sibling), des (descended), parent, child

List location address
allows indexing to an item in a list. Unlike XPath, this could be a multidimensional address, and the items are not limited to nodes in a document tree.

Tree location address
is a tumbler idea, navigating through a document using positional indexes

Path location address
views the tree as a matrix, so any node can be addressed as a level, sequence pair

Bibliographic location address
is 'a query that addresses information objects that system are not expected to access automatically'. Neat! How to reference things that are not computerized.

Named location address
a link to something with a name

Data location address
addresses strings and tokens

Property location address
links to a property that somehow belongs to the current object

Notation-specific location address
this allowed extension methods of addressing, provided by a system in addition to those built in to the standard.

Span location
allows spanning

Multiple location
allows ordering, duplicate ommission, and aggregation

Similar to XPath location steps, but without the neat directory-style syntax, these can be combined in location ladders. I think one killer for HyTime was a rather odd one: it took SGML's default name langth rules, and consequently used very terse code words that broke the camel's back for newcomers: relloc, listloc, treeloc, pathloc, bibloc, nameloc, dataloc, proploc, notloc it all got a bit too much.

But what is interesting in relation to the i4i patent is that by 1998 when the i4i patent had been granted, there was not only an ISO standard with specific capabilities to allow representation of a structured document that could point into ranges of free text (using dataloc, but it had already had a five-year review and was in its second edition. The 1992 version pre-dates the i4i patent submission date by two years.

And this is what drives me crazy. Standards people get caught by a cooperative vision, and work together, often sacrificing their time and income, to bring these ideas out and to put them into the public view. And then we see companies or individuals —innocently or opportunistically— taking the most obvious and straight-forward uses of these general technologies and being granted patents for them. It is outrageous, and work against the institution of standards which are the foundation of the modern economy. You standardize the ruler and then someone patents taking measurements!

To allow in 1994 that out-of-line markup was somehow original (as the i4i standard does with its talk of SGML but ignoring HyTime, when there was already an ISO standard to allow it that was then two years old, shows how incompetent or inept the USPTO was at that time: we should be careful not to demonize the examiners, of course, I presume that it was deficient procedures, budgets, mandate and vision. (The buck can always be passed.) The USPTO is obviously trying to do better now, but the Microsoft patent on XML documents with element content and and XML schema (the other patent of the two patents that this series of blogs is about) shows that the system is still ludicrous.

You should not be able to take a generalized standardized technology, whittle down a specific use of it and a subset of features of it, and then get a patent on it. There is no invention in using a technology for the very things it was invented for! Is writing down an idea that is already floating around and common knowledge, let alone ISO standardized, really sufficiently original to get a patent?

(Caveat: IANAL. And I certainly could be inadvertently missing some essential element in the i4i patent and will make any corrections or retractions necessary if that is the case. Readers are encouraged to read the patent themselves. And readers should certainly be aware that patents have defensive and negotiative aspects, FUD sticks to worry each other into mutually acceptable positions.)

You might also be interested in:

News Topics

Recommended for You

Got a Question?