Microsoft patents Schematron?

Put on your nappies...

By Rick Jelliffe
January 19, 2010 | Comments: 4

I see that the two top US patent monopolists at the moment are IBM and Microsoft, with about 5000 and 3000 patents granted in 2009 respectively. I would expect IBM to have more hardware patents and process patents (vomit), so the number of software patents might not be much different between them. Anyway, after the first few thousand, who is counting anyway? IBM and Microsoft are not litigious as far as software patents recently AFAIK, and their current use of software patents seems more defensive rather than being a source of license revenue at the moment; both are involved in non-assertion covenants or patent commons.

So how do they manage to keep up this incredible rate of patent innovation?

I was intrigued to discover US patent "XML patterns language" US7240279 The patent is basically a patent on Schematron-like systems where there is some automated repair.

The patent was applied for June 19, 2002 and granted July 3, 2007. Schematron 1.0 was released in November 1999, Schematron 1.5 in February 2001, and ISO Schematron became an international standard in 2006.

Here is what the patent says about Schematron:

A modified schema approach, called SCHEMATRON, was developed by Academia Sinica Computing Centre of the Republic of China (Taiwan), and is based on tree patterns in a source document rather than the grammar of the source document. SCHEMATRON is a structure validation language that utilizes an eXtensible Markup Language (XML) open schema to confirm whether a source eXtensible Hypertext Markup Language (XHTML) document conforms to the schema. In general, SCHEMATRON provides a few unique elements that simplify the use of XPath expressions. As an example, online documentation for SCHEMATRON (at the following web page on the world wide web: illustrates a sample schema that confirms whether a source XHTML document conforms to the W3C WCAG. The schema includes "rule" expressions for matching an XPath context attribute in a source XHTML document. Upon a match, the schema provides "assert" expressions that contain error statements indicating each specific nonconformance with the accessibility guidelines.

However, careful evaluation reveals that SCHEMATRON uses an inefficient process to accomplish its task. Specifically, SCHEMATRON first requires an eXtensible Stylesheet Language (XSL) document to transform the schema into an intermediate XSL document. The intermediate XSL document is then applied to the source XHTML document to execute XPath expressions that evaluate the source XHTML document and produce the error statements. In addition to being complex, SCHEMATRON does not provide any means to mechanically repair nonconforming source XHTML documents. Repairs must be implemented by manually editing the nonconforming source XHTML document based on the error statements. Further, SCHEMATRON is limited to processing XHTML source documents. It would therefore be desirable to provide a more efficient technique for evaluating any source document for conformance to a set of contextual guidelines. It is further desirable to provide summary and detailed reports that are organized by categories for a user to analyze and correct problems manually if desired. However, the technique should also enable mechanical or automated correction of any nonconforming portion of the source document that is identified.

That is factually incorrect as far as the following:

  • Schematron has always been a general XML system and is not limited to processing XHTML documents.
  • Schematron does not require XSLT for converting the schema to XSLT. In fact, the first version of Schematron used Omnimark for this purpose. This confuses the language with its typical implementation.
  • Schematron does not require XSLT for execution at all. There have been versions of Schematron made using XPath libraries.
  • Schematron is not even tied to XPaths: this is explicit in ISO Schematron, but from as early as 2001's Schemarama, which used the Squish query language, there were dialects of Schematon that did not use XPath.
  • Schematron provides the pattern element (required) that seems to duplicate the category element fairly exactly. (The phase element or role attributes could be used too.)

But leaving that aside, what is Microsoft's excellent alternative? Something that is so much less complex and innovative and non-obvious that it deserves a patent? Put on your nappies (US: diapers), you will need them when you behold the splendour of the Microsoft innovation...


It seems to be a patent on the method and systems of thei XMLP language which FrontPage used to check WAI accessibility rules. An example is

<xp:ruleset version="1.0"

<xp:category name="WCAG 1.5"
<xp:rule context="(//img) | (//noembed)">
<xp:pattern match="@alt | @longdesc" is-assert="true" priority="1">
<xp:report type="error">
Ensure there are text links for each
active region of this image map.
Ensure there are text links for each
active region of this image map.
If &lt;img> is used provide a
redundant list of links following
the image map.

And here is the Schematron equivalent. Note that there is not one thing that is in the XMLP that is not already expressible with Schematron, sometimes with the same element name (though XMLP and Schematron use the term "pattern" for different things.)

    xmlns:sch="" >
   <sch:pattern  see="
      <sch:title>WCAG 1.5</title>

<sch:rule context="img | noembed">
<sch:assert test="@alt | @longdesc"
role="WCAG Priority 1" diagnostics="d1">
Ensure there are text links for each
active region of this image map.

<sch:diagnostic id="d1" role="error">
Ensure there are text links for each
active region of this image map.
If &lt;img> is used provide a
redundant list of links following
the image map.

It seems that these are the correspondences:

xp:ruleset =  sch:schema
xp:category = sch:pattern
    xp:category/@name = sch:pattern/sch:title
    xp:category/@href  = sch:category/@see  
xp:rule  = sch:rule
xp:pattern = sch:report
   xp:pattern/@priority = sch:rule/@role  ?
xp:report/xp:summary = sch:report/text()
xp:report/xp:message = sch:diagnostic
   xp:report/@type  = sch:diagnostic/@role
xp:value-of  = sch:value-of
xp:variable = sch:let

Use of $ to access variable is the same.

In the patent, there is a little XML-based report language. Again, this was something that was part of Schematron 1.5 or earlier, and now is the SVRL language of ISO Schematron.

In the patent, one thing they do is annotate the input document with information to allow reference from the validation results. And this is indeed what David Carlisle's 1999 Schematron-report did (using IDs rather than line numbers.)

I was first. They took that from Schematron. (And they are welcome to it. But not to patent it!)

So what does the patent have that is not in Schematron, or just an obvious way to use Schematron? (I note that the patent does record that the examiner did look at some Schematron articles.)

Perhaps the idea seems to be that the method of not using XSLT, or not using the double compile is novel? But as I mentioned above, others were using other implementation methods.

Or perhaps the idea is that tying document repair to the schema language is not obvious? Well certainly the ISO standard mention repair hints. On 22 June 2002 I wrote to the Xerces group

Personally, I think an error object should be able to provide
  • file/line/character number
  • XPath
  • severity indicator
  • sendor ID
  • nickname or error-code
  • single line overview
  • multiline diagnostic, XML
  • icon for that error
  • URL for see also
  • unique ID for keying a repair method
  • unique ID for diagnostic generating function
In fact, the idea that you might want want to key some action to a Schematron validity report is good enough that IBM has a patent application on it too: ENFORCING CONTEXT MODEL BASED POLICIES WITH FORWARD CHAINING (20090138795)

I think the repair is the "innovation" in the patent: everything else seems just a copy of vanilla Schematron: in the claims it is:

if a match exists, generating a repair document including computer code that is executed to repair a nonconformance between the at least one source node and the pattern node, if the at least one source node does not conform to the pattern node;

Being the first at something obvious is not real innovation

The ideas in Schematron and its implementations are not new (apart from perhaps the idea that XPaths could be used for validation: I know this was not obvious to many people at the time until it was pointed out, but the XLinkIt people has the same kind of idea, so it was an idea whose time had come.) Symbolic processing is as old as LISP. Literate programming has a Zimmerman frame. Anyone familiar with the history of IDEs knows that automated/guided construction and repair was an idea around for ages: compared to the sophistication of things like the Berlin Project CIP, Schematron and XMLP are just trifles.

I am often asked why I didn't take out a patent on Schematron myself. The main reason is that I considered it so obvious that a patent would be impossible, even if I wanted to. And it was my understanding that Academia Sinica Taipei (my employer) was interested in open research and development, and promoting FOSS, and would be as disinclined as I was: my implementation of Schematron is now in its 10th year as a FOSS project.

I have grown used to the idea that people re-invent Schematron every few years. IMHO they always leave something out: the importance of text, the phases mechanism, or something else.

For example, look at this article XPath Rules! from the IBM developer website dated November 19, 2003. (The article is nice because it gives Java code rather than XSLT, but the language being defined is just a little Schematron: context Xpath, assertion Xpath, natural language assertion.) [One little improvement I do recognize is OASIS CAM, which predefines some functions for a more declarative approach, though I expect the same functions could be defined for use in Schematron too, if there was any demand.] Imitation is the sincerest form of flattery

Anyway, the scenario seems to be this: Microsoft people look at tools for Web Accessibilty conformance checking. They find Schematron's example. They copy it under their own namespace and elements, almost exactly, and add one obvious wrinkle, which is generating repairs from the validation results. They patent it.

I don't mind people re-inventing Schematron independently: that just proves that the idea is good and is enjoyable. And I would certainly prefer people use ISO Schematron than roll-their own, just because I think Schematron has proved itself following a decade of use, and it can always benefit from additional input. And I don't think Schematron is necessarily the ultimate in XPath languages. Nor that Schematron cannot be improved (I have submitted the draft of an update to ISO SC34 WG1 for processing). And people who need some other semantics can make up their own elements (XSD 1.1's assertions ignore the human aspects of validation, so it is appropriate they use their own namespace, for example.)

And I understand that the stupid patent situation puts companies in the position where they may want to get a trove of junk patents. And that hard-pressed examiners at the USPTO may not have access to ISO standards and so on. Or that they may not have the time to contact external parties. (Though, to be fair, I have been contacted by patent examiners in the past from a particular country, which I thought was very good.)

But yes, I do feel outraged when what I consider obvious ideas and uses of Schematron (or XML or SGML)—the kinds of uses I intended to be enabled for the public benefit—are granted to US corporations as monopoly rights. Even if it is just defensive, it sucks. What kind of dumb cycle have we gotten ourselves into?

I would like Microsoft to add the all ISO DSDL schema languages including Schematron to the OSP.

...Oh, and if I have the wrong idea about this patent, I would be delighted to be wrong so please let me know.

Semi- Unrelated

I've written many blog items over the years where I explain why I think software patents are a bad idea, and several concerning specific patents that I think are junk. Readers would know that I am particularly concerned about what I call Standards Trolls, which are people who take out patents on systems that are either the subject of an international standard, or which are just a logical or obvious application of the standard.

It seems that the USPTO does not consult International Standards as part of assessment of patents, nor does it even have any formal liaisons with standards organizations to ask whether there are any applicable standards that should be considered for prior art or obviousness. Last month I wrote about the i4i patent, that it seemed to me to be just an obvious application of the kind of referencing specified in the HyTime standard, which was completed prior to the i4i application and is not mentioned in the patent despite being well-known.

I think the i4i patent has very bad implications for International Standards, no the least of which is that there seems to be no adequate information gleanable from the judgement or patent about why Microsoft's CustomXML (the pink tags) comes under the patent while its XML content control (binding) does not. I have asked Microsoft's Gray Knowlton who isn't talking. It makes it pretty hard to know what is going on when the patent seems bogus, the court decisions skips workable details, and the stakeholders are stumm. It seems fishy.

You might also be interested in:


Rick you first setof bullet points say the opposite of what you probably meant:

Schematron does ***NOT*** require XSLT for execution at all. There have been versions of Schematron made using XPath libraries.

Gavin: Doh. My number one regular mistake is leaving out "not"s. My poor readers, I often think. I I have also fixed up "XLinkIt" where I brainfitzed and wrote "XLink".

Rick, I agree totally, with this much prior art how did it get approved? What possible purpose can it serve? Defensive means in practise what? Non-assertion is wonderful, but how on earth are they getting awarded in the first place?!
Another patent that to me falls into the same category of 'so obvious that a patent would be impossible, even if I wanted to' is MS's 'SIMPLIFIED REPRESENTATION OF XML SCHEMA STRUCTURES' only applied for in 2007. To me (although with additional irrelevant bells and whistles - eg network chennels etc) it is in concept indistinguishable from and (both start with xsd, abstract an xml representation using xsl/(insert other mechanism) to create xml and then perform a second pass to output a GUI) and is even pretty much what XForms is.
Note - none of these mention the auto-populating of the GUI (and initial T1.xml result) with any valid initial instance data so as to permit ongoing editing - can I patent this concept?!! :)
And as far as I can tell there is not even a concrete implementation (is there for their Schematron clone?), if not this gets even more concerning, can one patent a concept without doing any real work? Either way it seems to be only increasing the patent regime's irrelevance to the real world - for myself when writing new software I no longer bother searching patents as I know even if it doesn't exist it has already been patented, even if the concept is common knowledge.
Kind of pathetic really.

WayneOz: Thanks for the links. There seem to be quite a few of these.

I taught a course last week on XSLT. One of the participants said that they worked about 10 years ago in Malaysia for one of the large IC chip companies (not *that* one). They got equivalent of $2000 per month for their normal work; if they got a patent they got $3000. So consequently their focus was on trying to dream up patentable things, rather than useful things or things that would directly benefit the company. Monopolies (such as patent grants) distort market activity.

News Topics

Recommended for You

Got a Question?