Representing and Calculating the Cost of Processing for an Electronic Document

Another use for Schematron

By Rick Jelliffe
April 13, 2010 | Comments: 2

When you have a data supplier and a data consumer sending each other forms-type information (e.g. over the WWW) the supplier (lets call them the applicant) and the consumer (lets call them the institution) each have conflicting tactics for cost reduction.

The applicant's tactic is to provide as little information as possible, then let the institution tell them if they need more information. The institution's tactic is to want Straight Through Processing, where the form always has enough information to proceed. The applicant seeks to reduce work while the institution seeks to reduce re-work (sending the information back to the applicant, who has to spend more time, and then reprocessing the hopefully-fixed information.)

Where there are alternative information sections that the applicant may fill out, each may have their own associated cost for work. So the tactic of the applicant is to choose the alternative that involves the least work, and the tactic of the institution is to encourage the alternative that involves the least work at their end.

The problems are many: how do applicants and institutions communicate to each other information that will result in the minimum work and rework? How do they create incentives or penalties for each? Can the applicant be provided with some discount or penalty for providing information that reduces work or rework at the institution? Can the institution and the applicant negotiate some sweet spot so that the institution is not merely making the applicant to pay for the institution's inefficiencies?

Regardless of how it is organized, it might be useful to have standard formats which can be used to capture and express these costs, and to allocate the costs to particular XML documents. Here is how you might use Schematron, an ISO standard schema language, for this purpose.


The simplest version of this is where every item provided needs to be checked. The cost is like a restaurant menu. Here is a bill:
       <dish name="persian-walnut-stew"	serves="1" />
       <dish name="ling-a-la-makassa"	serves="1" />
       <dish name="goat-in-a-frock"	serves="1" />
       <dish name="buddha-jumps-back"	serves="2" />
       <dish name="all-things-nice"	serves="1" />

And here is a rough Schematron schema to count its cost:

 <sch:pattern  ... >
    <sch:let name="prices">
        <price name="persian-walnut-stew">10.86</price>
        <price name="ling-a-la-makassa"	>30.86</price>
        <price name="goat-in-a-frock"	>10.86</price>
        <price name="buddha-jumps-back"	>44.44</price>
        <price name="rumpy-pumpy"	>2.23</price>
        <price name="all-things-nice"	>8.00</price>

<sch:rule context="/bill">
<sch:report test="true()">
The cost of this is
<sch:value select="
sum( for $dish in dish return
* @serves)" />

Note that this requires XSLT2 and the latest version of Schematron, which allows values in let variables.

This is obviously a simple example, but the principle is the same for other marking systems.

Work estimation

The simple example above is obviously way to simple in many cases, particularly in the one we are interested in today, where we want to estimate between alternatives. Lets take an example where applications have different costs depending on what the provided values are. Here is some data:

       <person   nationality="AU">
           <name>Ferdinand Peppercorn </name>
           <age>24 </age>
            <name>Imelda Blavatsky </name>
            <age>39 </age>

And here is out marking schema:

 <sch:pattern  ... >
     <sch:let name="citizen-found"
        select="//person[@nationality='AU']" />
     <sch:let name="mature-found"
        select="//person[age > 25]" />
   <sch:rule context="application">
        <sch:report test="true()">
           If at least one applicant is listed as an Australian Citizen, 
                  the cost will be 20.
           Otherwise, if at least one applicant is listed as over 25 years old, 
                  the cost will be 80.
           Otherwise, the cost will be 100.
           The cost of processing this application 
                   has been found to be:
              <sch:value-of select="
                      sequence ( 
                         if ( $citizen-found ) then 20 else 100,
                         if ( $mature-found ) then 80 else 100)))" />

The costs could be looked up in a table for better management, but the idea should be visible. (Let me know if there is a syntax error please.)

First we detect the presence of various patterns. Schematron variables using let expressions provides a simple way to do this. (We could also use sch:pattern to detect the patterns and output the results to SVRL files, then use a second Schematron schema to interpret the data in the SVRL files, if our constraints were very complex. This way is more straightforward.)

Then we assign a cost based on what we found.

We are used to thinking in terms of schemas for validity or invalidity: it is possible to use the same Schematron technology to extract more fuzzy or interesting qualities of the document and report them. (If you want to consider this a report rather than a schema, that is fine by me.)

You might also be interested in:


You mentioned (not directly) about feature that I found possible only in XML world. It is detaching validation from the host system. Both syntax and semantical validation can be performed without any consumer (institution you call) application. The former can be realized using XML Schema, the latter using Schematron.
It allows to prepare complex documents off-line and submit them when you like.

There is also third kind of validation (or evaluation, calculation) which is impossible to be exposed to applicant. Insurance company cannot show their business rules to customers.

Marcin: Yes indeed.

I imagine many institutions will go for making coarse-grain business rules checking available and clearer to clients, while trying to keep their precious details to themselves. Ultimately it is not a game they can win well: it is becoming difficult for consumers to avoid aggregating data and finding out the rules.

(I think that often business rules can be found if you are a little organized. For example, the other day I applied for a personal load from my bank online: by varying the numbers in the forms I could figure out exactly what their requirements were.

Any large insurance broker with enough clientele could just look at their records and figure out at least the general shape of the business rules.)

News Topics

Recommended for You

Got a Question?