IBM patents SGML tag implication?

A conga-line of bottom feeders...

By Rick Jelliffe
January 25, 2010 | Comments: 4

It gets worse the more I look at it. In recent blog items I have been looking at the issue of companies getting US patents for technologies that are part of international standards or obvious applications from them.

See Microsoft patents Schematron? and Microsoft and the two patents (followed up here, here, here, and here) mainly about the i4i patent.

I have been concerned about this issue for a long time: in fact, I asked the Australian national standards body Standards Australia to raise the issue of the encrouchment of patents on standards (in particular the legal status of standards) at a JTC1 plenary, and they told me it had. I gather that the JTC1 representatives would prefer it to be someone else's problem. They need a boot up their collective backside. Of course legal protection must be gained through the political process, but the standards organizations need to act as a union for standards-makers: the standards bodies don't need to wait for a grassroots movement, they already are a grassroots movement (in countries where the standards body is not a government department.)

Is the reason that IBM and Microsoft are the top patentees because they bottom-feed standards and the obvious uses or workarounds of standards? I wrote about this a few years ago, claiming Microsoft was trying to patent a XML patent-writing machine. (Note the comment accusing me of Microsoft-bashing :-)

The latest one that came to my attention is IBM's patent 7,143,346 Simple types in XML schema complex types.

What is claimed is:

1. A method of parsing an XML stream containing simple types wherein said simple types are not XML elements within a complex type, said method comprising: receiving said XML stream; parsing said XML stream comprising: on encountering a parent element in said XML stream, utilizing an XML schema to locate a type for said parent element; where said type is a complex type, determining whether a mixed flag for said complex type in said schema is set to true; where said mixed flag is set to true, interpreting fragments embedded in said parent element in accordance with said complex type, each fragment being one of an arbitrary string and an element; generating parser output events interpreting each embedded fragment as one of an embedded simple type and an inherited simple type where, in accordance with said complex type, each embedded fragment corresponds to a dummy element having a simple type, with one of (i) a name of said dummy element and (ii) a name of said simple type being one of a predetermined set of names.

IBM's patent provides a way annotating a schema so that the text fragments found in element content can implied by the software to belong to an element and therefore to have some simple type.

Sound familiar? ISO SGML (IS8879:1986) provides a feature called tag omission. This is a way of annotating a schema so that text fragments in element content can be implied by the software to belong to an element and therefore to have some element type.

Now in the case of SGML, the schemas are DTDs not W3C XML Schemas. And the implication is at the tag level rather than elements. And it is built-in to the parser rather than being a post-process. I don't see how it makes any difference. The idea is obvious, there is prior art in doing this kind of thing.

I was on the W3C XML Schema Working Group that created XML Schemas. And I was on the ISO/IEC committee that maintain SGML in the 1990s. This patent just sucks.

Why is there no reference to SGML in the patent? SGML was largely invented at IBM, and the efforts of IBM-ers like Goldfarb, Adler, Berglund and Wohler were key to what success it achieved.

Is it really true that one standard (W3C XML DTDs) can take out a feature from another standard (ISO SGML DTDs), then alter it cosmetically with some extra features (W3C XML Schemas), and then a US company can get a patent on that original feature again?

Apart from obviousness, wilfull ignorance, and prior art (IMHO IANAL), this patent has another aspect that unsettles me, though I am cannot really articulate it at the moment. In Thomistic terms, it treats accidents as a substance. What is the substantial difference between an XML Schema schema and any other kind of schema (RELAX NG, XML DTD, SGML DTD)? To me, it is like the difference between a red brick and a green brick: a patent that limits itself to red bricks to do something that was old hat for green bricks and unrelated to colour just seems dodgy, to use our slang.

You might also be interested in:


Hmmm... I'm imagining the plight of the patent examiner. They are searching for prior art. They have access to all sorts of databases, but I bet they focus on the technical press, e.g., ACM and IEEE journals.

If ISO standards are locked behind firewalls and available only for rather exorbitant fees, I bet that would be a blocker for them right there. What you need is a full-text index of all ISO standards, maybe available only to patent examiners, or by subscription to all. But right now we have nothing. So in terms of relevance to patent examiners, I bet W3C, OASIS and other open standards, freely available on the web, count for more than ISO standards which are unavailable except for a per-copy fee.

I think that is the approach. You can't expect patent examiners to be familiar with every technological dinosaur like SGML.

Rob: I certainly agree that it is difficult for patent examiners to know obscure standards, such as the basis of HTML. It is not as if it is their job.

And I wouldn't expect the USPTO to have expertise floating around. It is not as if they were a large electronic pushing concern who actually used SGML for their publishing for several years.

And it is not IBM's job to know these things. It is not as if they basically invented the thing, used it, and championed it through standardization.

And a corporation who participates in a standards body would never turn around several years later and announce they have IP for some part of the standard, would they? It takes a hypocrit to participate in a standards body hiding that you have some IP in it, but a genius to get the IP years after the standard has come out!

Oh, wait...

But actually I do agree that it can be counter-productive to have firewalled standards (especially for IT) in the age of Google. In particular it is bad for non-corporate FOSS people of course, but for any casual, ad hoc, disorganized searches.

IMHO the USPTO needs to show that it copes effectively with prior art in standards. The world runs on standards. It needs to do this in liaison with standards bodies, and without attempting to slough off costs to those bodies which are often voluntary. And if it is incompetent or unable to find prior art, or did not look for it in fundamental areas like standards, it needs to apply a high bar to any patents in the same area as a standard.

In particular those patents or application which misrepresent the standard (by omission or commission) or which are 'standard + 1' (I mean which repeat some features of a standard, then merely add one generic thing.)

And I do want to commend IBM for its patent commons. But I don't see that this patent is in there though? Could you ask whichever marketing person arranges this to look at it please?

I'm not going to comment on a patent I have not read and your assertion that it relates to a standard I also have not read. I'm just pointing out that the USPTO does not care about standards per se. They care about publications. And publications which are easily available to them will feature more prominently into their prior art searches than 100,000 standards listed in a catalog with no full-text and little useful metadata.

It is all about findability. Companies who want their prior art well-known for defensive purposes use services like to make sure it is easily findable.

Note that unlike most of the stuff you complain about on your blog, this is actually something you might be able to influence, if you cared to make a try at it. You might talk to Standards Australia and see if they they can bring this up to JTC1.

Rob: You are preaching to the choir that standards should be free and public, brother: paper has not been an effective mechanism for two decades, and I know that many national standards bodies have been fighting with the issue for as long.

(The standard I edit, IS19575-3 Schematron is available free and public from the ISO site, by the way. Indeed, it is one of the reasons why the more enlightened SCs like SC34 prefer to be involved in review and maintenance rather than development: it is easier to get JTC1 to put a standard put up as a publicly available standard on the ISO website if it came from outside.)

I have indeed brought up this issue previously with Standards Australia. And I know that several other editors in SC34 are also concerned. Of course, it is not a new issue: see Bob Glushko's comments in

However, the large US corporations are generally in good positions to cope if it turns out they have to pay big license fees. It is the medium to small developers who are impacted, and especially the non-corporate distros of Linux and FOSS.

News Topics

Recommended for You

Got a Question?