When you have a data supplier and a data consumer sending each other forms-type information (e.g. over the WWW) the supplier (lets call them the applicant) and the consumer (lets call them the institution) each have conflicting tactics for cost reduction.
The applicant's tactic is to provide as little information as possible, then let the institution tell them if they need more information. The institution's tactic is to want Straight Through Processing, where the form always has enough information to proceed. The applicant seeks to reduce work while the institution seeks to reduce re-work (sending the information back to the applicant, who has to spend more time, and then reprocessing the hopefully-fixed information.)
Where there are alternative information sections that the applicant may fill out, each may have their own associated cost for work. So the tactic of the applicant is to choose the alternative that involves the least work, and the tactic of the institution is to encourage the alternative that involves the least work at their end.
The problems are many: how do applicants and institutions communicate to each other information that will result in the minimum work and rework? How do they create incentives or penalties for each? Can the applicant be provided with some discount or penalty for providing information that reduces work or rework at the institution? Can the institution and the applicant negotiate some sweet spot so that the institution is not merely making the applicant to pay for the institution's inefficiencies?
Regardless of how it is organized, it might be useful to have standard formats which can be used to capture and express these costs, and to allocate the costs to particular XML documents. Here is how you might use Schematron, an ISO standard schema language, for this purpose.
MenuThe simplest version of this is where every item provided needs to be checked. The cost is like a restaurant menu. Here is a bill:
<bill> <dish name="persian-walnut-stew" serves="1" /> <dish name="ling-a-la-makassa" serves="1" /> <dish name="goat-in-a-frock" serves="1" /> <dish name="buddha-jumps-back" serves="2" /> <dish name="all-things-nice" serves="1" /> </bill>
And here is a rough Schematron schema to count its cost:
<sch:pattern ... > <sch:let name="prices"> <price name="persian-walnut-stew">10.86</price> <price name="ling-a-la-makassa" >30.86</price> <price name="goat-in-a-frock" >10.86</price> <price name="buddha-jumps-back" >44.44</price> <price name="rumpy-pumpy" >2.23</price> <price name="all-things-nice" >8.00</price> </sch:let>
The cost of this is
sum( for $dish in dish return
* @serves)" />
Note that this requires XSLT2 and the latest version of Schematron, which allows values in let variables.
This is obviously a simple example, but the principle is the same for other marking systems.
The simple example above is obviously way to simple in many cases, particularly in the one we are interested in today, where we want to estimate between alternatives. Lets take an example where applications have different costs depending on what the provided values are. Here is some data:
<application> <person nationality="AU"> <name>Ferdinand Peppercorn </name> <age>24 </age> </person> <person> <name>Imelda Blavatsky </name> <age>39 </age> </person> </application>
And here is out marking schema:
<sch:pattern ... > <sch:let name="citizen-found" select="//person[@nationality='AU']" /> <sch:let name="mature-found" select="//person[age > 25]" /> <sch:rule context="application"> <sch:report test="true()"> If at least one applicant is listed as an Australian Citizen, the cost will be 20. Otherwise, if at least one applicant is listed as over 25 years old, the cost will be 80. Otherwise, the cost will be 100. The cost of processing this application has been found to be: <sch:value-of select=" min( sequence ( if ( $citizen-found ) then 20 else 100, if ( $mature-found ) then 80 else 100)))" /> </sch:report> </sch:rule> </sch:pattern>
The costs could be looked up in a table for better management, but the idea should be visible. (Let me know if there is a syntax error please.)
First we detect the presence of various patterns. Schematron variables using let expressions provides a simple way to do this. (We could also use sch:pattern to detect the patterns and output the results to SVRL files, then use a second Schematron schema to interpret the data in the SVRL files, if our constraints were very complex. This way is more straightforward.)
Then we assign a cost based on what we found.
We are used to thinking in terms of schemas for validity or invalidity: it is possible to use the same Schematron technology to extract more fuzzy or interesting qualities of the document and report them. (If you want to consider this a report rather than a schema, that is fine by me.)