XML grandfathers change

Triumph has a hundred fathers

By Rick Jelliffe
January 18, 2010

A belated Happy 2010 to my readers. I wish you large belts and small footprints.

La vittoria trova cento padri, e nessuno vuole riconoscere l'insuccesso.

XML has gone through a few parental stages, first as the sire of an explosion of new vocabularies, applications and API, next as the spawner of bloated monsters, and then as the proud parent of the new generation of the most successful document formats.

But now XML is a grandparent: its genetic material is being recombined indirectly. It has become an influence, a part of the environment that other technologies naturally want to support. A no-brainer.

To point out the inadequacy of my metaphor, actually XML has influenced other technologies thoughout the last decade: Perl (and Ruby?) finally got Unicode support, JSON was developed for sending marked-up regular data as text with the notable improvement of syntactic support for datatypes; PMD and many other systems adopted the approach of expressing their expert systems using XPaths even where the data was not XML.

More recently, general purpose object/functional programming language Scala allows XML fields with real XML syntax as part of the basic language.

But the most interesting example of the diffusion of XML genes or memes or technical DNA is in Intel's current i7 processor range. The SSE 4.2 instructions include several directly aimed at faster parsing: basically, they let you compare two strings of up to 16 bytes each: you can get a response back either as a boolean (found/not found) or as a mask (byte for byte comparison.)

Glossy details here. More technical details relating to XML parsing here and to Schema validation here. These articles are almost 2 years old now: what has changed is that now the chips are out and nearing use in consumer systems.

Similarly, in the area of formal computing science (i.e. language/grammar/automata theory), over the last few years we have seen a lot of activity where XML-type languages (and, indeed, SGML-type languages) are treated as objects of interest rather than objects of scorn. I was never comfortable with the idea that the SGML was somehow deficient because the theoretical computing science world did not provide a very satisfactory theoretical class of languages to match it, at least in its first decade—this seemed a lack in the theory rather than in SGML, though it certainly was a curse on would-be SGML developers: a sin of imprudence (by the standardizers) not of inelegance or incompetence.

Now that the technological world is permeating with XML directly and indirectly, it seems clearer how XML-unfriendly the technical environment was even 10 years ago. In retrospect, I'd say that the biggest thing that helped XML along was Java: it provided adequate Unicode and parsing support, and enough packaging/class mechanisms to allow the SAX API, had a URL library, and that was good enough. It didn't have a good tree API (I mean enhanced trees, AVTs or LISP lists), its serialization stunk, and it didn't have a good story for producing content for the WWW, so XML didn't reinvent any wheels. (I am not sure why people would use Java now that Scala is available, except for 'externalities' like FindBugs.)

Maybe XML's current job is just to provide a good-enough basis for the next big thing? The success of a technology may measured in uptake and longevity (as well as its success in its own niche, which is respectable too), but also by what it indirectly spawns, influences or enables.

You might also be interested in:

News Topics

Recommended for You

Got a Question?