Schematron and time: complex event processing?

By Rick Jelliffe
November 13, 2009

I have been thinking a little bit about whether Schematron's pattern approach could be applied to complex event processing where the input is a stream of discrete XML documents, for example each one being a reading from a set of sensors. (The background was that I was looking to see what became of Lucid: is Oz really the only modern viable dataflow language? What about ML and its spawn? I bet declarative-looking dataflow could be done in Scala using some boffinry. Lucid is the reason why Schematron's let is called let by the way.)

One approach would be to use the new multi-document feature.

Lets imagine that each new incoming document triggers a new invocation of Schematron with three invocation parameters: the (URL of) the new document, the (URL of) the previous document, and the (URL of) a delta document where the delta document has only the elements or attributes whose values (or occurrences or any descendants) have changed. So we are not interested in information that has disappeared, or in tracking movements: it is just alterations and new things that go in this delta.

With this, we can then have three patterns in the Schematron schema:

<sch:pattern id="n"
<sch:title>The New Document</sch:title>

<sch:pattern id="n-1"
<sch:title>The Previous Document</sch:title>

<sch:pattern id="delta"
<sch:title>The Delta Document</sch:title>

With these patterns we have a very clear notion of events that are static to the old or new data (e.g. is there currently a building on fire?, or that are based on changes between the data (e.g. has any building caught on fire since the previous data?)

So that is OK for what Wikipedia called detection oriented complex event processing, but what about computation oriented complex event processing?

To do that we need a way of passing data calculated in one invocation of Schematron to the next. This is where the new properties feature could be used:it lets you add arbitrary data with calculated values to the SVRL. We define a property in the schema, this is passed to the output SVRL (Schematron Validation Report Language) document and this SVRL is then reloaded into the next invocation.

For example, lets say we have a flow of a pizza order document and we want to track the accumulated outstanding (unfulfilled) orders:

<sch:pattern id="n"
<sch:title>The New Pizzas</sch:title>

<sch:rule contest="/">
<sch:assert test="count(//order) > 0"
properties="accumulated-orders" >
There should always be at least one order,

Next we will make a top-level variable to hold the results of the previous Schematron report (probably we would get the URL from an invocation parameter rather than hard-coding it):

<sch:let name="old-svrl" select=
"document( '' )" />

And we have a property definition

<sch:property id="accumulated-orders">
<sch:value-of select=
" $old-svrl//outstanding-orders +
count(//order) - count(//filled)" />

And we could even have a test to warn us if there are too many outstanding orders:

<sch:rule context="/" >
test=" $old-svrl//outstanding-orders +
count(//order) - count(//filled) < 10">
When number of outstanding orders is
10 or more, there will be a delay. So give
the customer a free cheeze stick.

That seems pretty straightforward. In both cases, however, there needs to be some superior shell to invoke Schematron when the new data becomes available: my prejudice is that adding polling constructs (for example) into Schematron would shuffle it from being a well-pitched pattern reporting language into being an under-powered dataflow language.

It is the same reason why I have resisted the fairly obvious improvement of being able to chain Schematron phases (perhaps using some kind of state approach) so that success or failure in one phase causes the execution of another phase. Schematron phases allow multiple patterns to be grouped so that patterns outside the current phase are not matched. However, in the case of chaining phases, it would allow a zero-ing in on problem areas and reduce useless tests I suppose.

But perhaps another wrinkle would be to allow a calculated selection of phase: so that as well as being hard-coded into the schema and selectable by an invocation parameter, the phase could be selected by some top-level value-of statement. In the kind of looping environment I mention above, it would mean that the phase of one Schematron validatation could depend information in the SVRL output of the previous validataion. This would probably be the better way to arrange things, since it would free the invoker from having to interpret any SVRL in order to set the next phase. I don't know what syntax would be appropriate: the human readability being king.

You might also be interested in:

News Topics

Recommended for You

Got a Question?