Eight years ago, after the XSLT 1.0 specification had been around for a couple of years, those of us who had been working fairly heavily with the early specification came to the collective realization that something was missing. XSLT1 had a number of holes, including the inability to create intermediate trees, very limited math functions, no way to parse strings (or create strings from sequences of nodes) and so forth. Additionally, while the rather ambiguously worded statements about extension functions in the XSLT specification indicated that extension functions were possible, it also made it very clear that this was a vendor specific issue.
What this meant in practice was that as vendors and project developers put out differing XSLT implementations, the support that they offered for extensions varied wildly, from none at all to pre-defined function sets to an open-ended API for extending the language, depending upon the vendor. What's worse, even in cases where two vendors did establish extension libraries, their implementation and function signatures varied dramatically from one another, making portability of XSLT scripts a real problem.
In the spirit of the original standard, one solution that made a fair amount of sense was to establish a library of routines outside of immediate vendor control, which Jeni Tennison, Jim Fuller, Dave Pawson, Uche Ogbuji and other well known XSLT hackers did by establishing the EXSLT.org project in 2001, the name
The goal for ESLT was simple - create namespaces for library modules in several different areas - core XSLT (for functions like node-set(), which let users convert node fragments into trees), math functions, regular expression methods, string functions, dates and times, set manipulation, and related areas. Where possible, XSLT-based solutions were also suggested in those cases where an extensibility framework couldn't be fully implemented, though these were generally last resort routines.
The real power of EXSLT came as a way of establishing a standard, platform independent set of functions that could be implemented by XSLT processor vendors or third parties with those tools, and while it took awhile, nearly all XSLT implementations out today support at least a subset of EXSLT.
In a recent article for IBM DeveloperWorks, Jim Fuller (part of the original EXSLT team) raised the question of whether XQuery is in fact reaching the same stage of needing a consistent vendor neutral extension library:
Most XQuery implementations have added their own third-party functions, providing all manner of additional capabilities. The obvious issue with using such extension functions is that you make your XQuery code potentially incompatible if you depend on a specific, non-standard functionality exposed by a specific implementation.
I fought this fight against vendor lock-in before in assisting with the EXSLT effort. I am not surprised that it comes up again in XQuery.
I'd come to much the same conclusion as I reviewed a number of XML Databases recently ... while most have made the jump to the final release of XQuery 1.0, there was very little consistency with methods that fell outside of these core functions, even when there was obviously similar thinking with regard to what was needed.
As one obvious instance, the random() function is quite useful in XQuery for generating simulated content or for statistical sampling, both of which occur quite frequently in applications I've worked with, yet a random number generator was deemed as outside of the scope of XPath 2. Unfortunately, different systems implement a randomizer function in different namespaces, often with different function signatures, as Table 1 illustrates:
|MarkLogic||xdmp:random([$max as xs:unsignedLong]) as xs:unsignedLong|
|eXist||math:random(xs:double) as xs:double, util:random(xs:integer) as xs:integer, util:random() as xs:double|
Other databases, such as IBM DB2 Pure XML, don't define a random() function at all, but rather provide a set of Java or C# hooks to implement one from the appropriate foundation classes.
In many cases, the EXSLT functions (which are in reality XPath functions) can be used quite effectively within an XQuery context as they stand, though when XPath 2.0 was developed -- which forms the XPath basis for XQuery -- the EXSLT model was carried over as applicable so there's a lot of overlap between EXSLT and XPath 2.
The biggest holes are in more systemic areas - parsing and serialization control, document validation, higher order evaluation and invocation, more sophisticated ordering capabilities, tag crossing text search capabilities, math functions, web services invocation, server environment functions when in that context, document enrichment, even geospatial functions.
These are all common operations on XML databases in particular, which I suspect will represent the bulk of all XML search and manipulation in the near future. Other areas are more problematic, but just as useful - IMAP or POP3 mail production, SQL integration languages, XSLT integration as well as common user and group validation and related database functionality.
For the most part, new EXQuery functions would simply represent wrappers around existing XQuery extension functionality in order to provide a consistent interface between databases. It would also set a bar that determines the minimal expectation of such databases and data systems and provides a way for new entrants into the field to be able to XQuery scripts without having to refactor code.
In an era of data abstraction, web services and distributed data, EXQuery just makes sense. Please let me know what you think about getting involved in an effort like this?