A sketch on recasting XBRL in Schematron

By Rick Jelliffe
November 24, 2008 | Comments: 5

In the next few years a lot of people will be generating XBRL documents, in particular for financial filings to regulators. And a few years later a lot of people will be figuring out what to do with all that data too..

But the immediate challenge for XBRL is that its taxonomy declarations are built on top of XML Schemas substitution groups, and this adds a certain amount of "complexity", which is a polite way of saying "reliance on tools to make the booboos go away". Before people will be generating or using all these XBRL documents, regulators will need to be beavering away making national and departmental taxonomies. (I am told that because national accounting regulations and reporting requirements vary so much, we shouldn't expect much action on pan-national taxonomies.)

The XBRL instance language is fairly simple and straightforward.

I decided to take a look at whether XBRL could, keeping the same instance syntax and concepts, have a schema language transplant so that Schematron was used instead of XSD. Here are some thoughts.

XBRL is aimed at the exchange of "performance data", for example summaries of financial information for government treasury or taxation purposes. The instance syntax is relatively simple, however underscoring XBRL is an adventurous use of XML Schemas not only to provide structure and typing information, but also to model semantic relations. The creation of the taxonomy for a particular kind of data is not necessarily nightmarishly difficult, however XBRL has an awful lot of backgrounding.

Basic Instance

We can make a pattern for the basic instance document. An XBRL document has some fixed linking information, but otherwise contains user defined elements, which contain facts: either single items or tuples containing items.
<pattern id="instance">
	<title>Some basic constraints on all XBTL element</title>
	
	<rule abstract="true" name="element-content">
		<assert test="string-length(space-normalize()) = 0">
		The <name/> should have no data content.</assert>

	</rule>

	<rule context="/" >
		<assert test="xbrl:xbrl">
		The top-level element should be xbrl:xbrl </assert>
	</rule>

			
	<rule context="xbrl:xbrl">
		<extends rule="element-content" />
		<assert test="*[1][self::link:schemaRef]">	
		The xbrl element should have a link:schemaRef elements in the first position.
		</assert>
		<assert test="count(link:linkbaseRef) + count(link:roleRef) 
		+ count( link:arcoroldRef) + count(link:footnoteLink) +
		count(link:schemaRef) = count(link:*)">
		The xbrl element may the following link elements;
		linkbaseRef, roleRef, arcoldRef, footnoteLink, schemaRef.</assert>

		<assert test="count(xbrl:unit) + count(xbrl:context) = count(xbrl:*)">
		The xbrl element may contain the following xbrl elements:
		context, unit.</assert>
	</rule>

	<rule context="link:schemaRef | link:linkbaseRef |link:roleRef | link:arcoroldRef">
		<assert test="count(preceding-sibling::link:*) = preceding-sibling::*)">

		There following link elements go before any other elements:
		schemaRef, linkbaseRef, roleRe, arcoroldRef.</assert>
	</rule>
	
	<rule context="@contextRef">
		<assert test="//xbrl:context/@id = current()/.">
		A contextRef attribute should locate a context element.
		</assert>

	</rule>
	
	<rule context="@unitRef">
		<assert test="//xbrl:unit/@id = current()/.">
		A unitRef attribute should locate a unit element.
		</assert>
	</rule>

	<rule context="@id">
		<assert test="count(//@id = current()/.) = 1">
		There should be no elements with the same value for the id attribute.
		</assert>
	</rule>
</pattern>
Other rules and assertions would fill in various gaps, and cope with the link elements.

Schema

Next, we define the other elements that can be items and tuples. In order to do this, we define Schematron abstract rules, for convenience. First we can define various abstract rules which select whether an element is a tuple or item, and which data type it uses. An overlapping "mixin" system such as Schematron uses is more convenient than the complex derivation/substitution group system that XML Schemas uses.

For example, here is an abstract rule
<rule abstract="true" id="item">
   <assert test="parent::xbrl:xbrl or not(parent::xbrl:*) ">
   An item is either a top-level element, or the contents of a tuple.
   </assert>
   <assert test="@contextRef">An item must have a contextRef</assert>
</rule>

<rule abstract="true" id="stringType" >
   <assert test="string-length(normalize-space())> 0">
   A <name/> element should have data content.
   </assert>
   <assert test="not(*)">

   A <name/> element should not have any child elements.
   </assert>
</rule>
Now we use these abstract rules readily.
<rule context="RegisteredOwner">
    <extends rule="item" />
    <extends rule="stringType" />
</rule>
However, the ontology does more than just a schema: it also allows extra information to be registered on types, and therefore on elements that use these types. However, this requires not just a Post-Schema Validation Infoset (what notionally is the outcome of validating a document with XSD) but a Post-Ontology Processing Infoset. Currently, this requires specialist software. ISO Schematron specifies an output XML language for reporting schematron validation results: SVRL.

The current version of ISO Schematron allows foreign attributes, which can be used to decorate the SVRL output

Lets see how this works. In an XBRL ontology, an item can have a period, either duration or instant. We make a simple abstract rule that declares the property value for the element.
<rule abstract="true" id="periodItem">
  <report test="true()" xbrl:period="duration" />
</rule>
So now we can extend our element declaration:
<rule context="RegisteredOwner">
    <extends rule="item" />
    <extends rule="stringType" />
    <extends rule="periodItem" />
</rule>
Another example declaration:
<rule context="HotDogSales">
    <extends rule="item" />
<extends rule="monetaryType" />
<extends rule="periodItem" /> ...
</rule>
So all the different kinds of structure testing, datatype checking, and property additions can be accessed using the same simple mechanism: predefined abstract rules. The assertions allow both validation, property augmentation and are declarative. And the fact that these are flat rather than necessarily hierarchical avoids the explosion of concepts that you get when using type derivation and substitution groups in XSD.

Properties
In the next version of Schematron I am proposing to add a more formal properties mechanism. This will allow a property attribute to reference a proper name/value/scheme triple.

For example:

<rule abstract="true" id="periodItem">
<report test="true()" property="durationItem" />
</rule>
...
<property id="durationItem"
name="duration" value="period" scheme="http://www.xbrl.org/" />

Conclusion
I have not looked into how dimensions might be handled yet, but I have not found any early gotches for replacing XSD with Schematron in XBRL.  I don't expect it would be a practical enterprise until XBRL had reached its critical mass and people were more interested in consolidation and becoming more efficient. However, I think the Schematron approach may have some advantages: for example, the mixin approach of making a taxonomy merely by filling in the name and ticking some boxes to select the appropriate abstract rules to include does seem relatively straight-forward and human friendly.

But, as always, I think the real strength of Schematron is not on the computer side, but on the human side. The ability to state in natural language what the constraint is, and to generate responses for users, both using the domain terminology that would be understood by the user (or by the user of a specific application), surely cannot be sneezed at? 

Does modelling the taxonomy in XSD in fact mean that in practise you can only supply your users with Xerces error messages?  And does, in turn, this means that your programmers will have to implement custom validators and validation messages, why is that less work than having Schematron in the first place?  Schematron provides a development option halfway between the incomprehensibility of XSD and the double handing of bespoke systems.

[Note: I haven't validated the XML in this page, it is a sketch.]

You might also be interested in:

5 Comments

Rick,

One of the key disadvantages that XBRL faces is that, for all that it is just really appearing above the radar for most people, the specification itself is older than XML itself by a couple of years, and as such has evolved mechanisms that were necessary at the time but that now appear odd or even counterproductive as XML technologies themselves emerge.

Schematron is a good solution to many of the constraint problems that XBRL faces, while XQuery in turn could very well solve many of the more intractible functional problems with the specification, and XSLT the presentation issues. However, to do so involves getting financial people with only a limited technological understanding to rethink their existing mechanisms in this light.

Articles like yours I think are good for doing just that. I would also recommend in your doubtlessly copious free time (being just a wee bit facetious here) that you may want to contact the XBRL working group and make your recommendations known to them directly.

-- Kurt

Kurt,

What are you talking about? The XBRL 2.1 Specification reached recommendation in December 2003. XML had been a recommendation for nearly five years at this point, and even XML Schema had been at recommendation for two. XQuery can, and is, used with XBRL, as is XSLT, although the "presentation issues" associated with the business of accounting and performance reporting are far more complex than most XML environments. "Getting financial people with only a limited technology understanding" is, I'm afraid, a rather patronising statement, and mostly wrong.

XBRL has very significant traction (far, far more than the vast majority of XML verticals) and one of the extremely important technical considerations that the XBRL consortium faces is the need to provide stability for the software vendors that serve the hundreds of thousands of companies that use XBRL for regulatory or performance reporting of one sort or another around the world.

Could it be simpler? Probably, and I wouldn't want to discourage research and ideas like the ones that Rick is putting forward here. But I'm afraid your comments are a largely inaccurate oversimplification.

Regards

John Turner
CEO CoreFiling
Chairman, XBRL International Standards Board

First off, I am with John, I don't believe Kurt has his facts correct when he says XBRL is older than XML.

Also, the first thing that comes to mind when I read this is why are there so many Java and .Net XBRL processors and no Schematron based XBRL processors if it were so effecient or effective to do thing relating to XBRL in Schematron.

The second thing that comes to my mind is I wonder if you have ever used an XBRL processor Rick. Seems like it would be to know what alternatives exist.

Finally, it seems that a good thing to do would be to compare and contrast what one can do with Schematron and what one could do with an XBRL processor and see which is best for what tasks.

Charles: Casting XBRL in Schematron would not really replace Java or .NET implementations: Schematron doesn't say anything about how the properties would be used, any more than XSD says anything about how the PSVI information should be used.

I hope when the second thing that came to your mind was something like "Oh, how could there be Schematron-based XBRL processors when no such thing exists: this article introduces the idea!"

The point of the article is to ask whether Schematron could substitute for XSD in XBRL, not to ask whether Schematron could substitute for XBRL.

Just because I typically reject (I don't think I am completely doctrinaire about it) XML vocabularies written with the assumption that developers won't see the XML, it does not mean that I am against tools or think they are, on the face of it, a sign of a weakness in a vocabulary.

But if XML is about anything, it is about being directly developer-friendly: and I don't think it is particularly developer-friendly to make a vocabulary which needs to be hidden behind an API or nice tools in order to be useful. I think XSD and technologies built on XSD run that risk, which I think the relative simplicity of the Schematron approach given above rather proves.

But this item is a sketch and is intended as a positive contribution: a critique not a criticism IYSWIM. I think all the tool-makers know that if XBRL fails (and there is always CALS as an existence proof that a large mandated push for the use of markup standards can fail due to complexity downplayed by vendors) it will be solidly because of this complexity.

Rick,

It is interesting to think through the significant consequences of tying XBRL so tightly to XML Schema. At the time that XBRL was being designed strong feedback from the XML community was that XML Schema usage should be strengthened rather than relaxed. You can see this in the step from XBRL 1.0 to XBRL 2.0 and 2.1.

In retrospect, that tight relationship has come at a price and experimentation with means of swapping out XML Schema validation with other validation approaches is quite enlightening.

It is also interesting to note that XML Schema validation is nowhere near sufficient for the users of XBRL, hence the efforts to develop a "formula" extension to XBRL that allows "assertions" to be made about the content of XBRL reports. The design of this formula extension has tried to draw on the philosophy of Schematron (while adapting it to the complex XLink style structures that are so much of the XBRL syntax).

In particular, considerable effort has gone into enabling formulae and assertions to be expressed in ways that can be documented for humans and which can be translated into a set of XPath 2.0 expressions for computers to evaluate them.

I think that the "Schematron-style" approach has paid off well but the formula and related specifications are still at Candidate Recommendation status and would likely benefit considerably from your feedback, perhaps even regarding clarifying the nature of the relationship between XBRL formulae and assertions and Schematron assertions.

In particular, the Formula Working Group is in the process of deciding to what extent the reporting of validation processing operations should be standardised from a Syntax perspective. In that space also, the work of the Formula Working Group is currently being guided by the approach taken with Schematron processing reports. Views on that would also be very welcome, either here or given directly as feedback on the current formula specification candidate recommendation.

Regards

Geoff Shuetrim
Editor of the XBRL formula and related specifications.

News Topics

Recommended for You

Got a Question?