Grouping XML with XSLT – From Muenchian Method To XSLT 2.0
Everybody who ever did some XML processing, with or without XI, knows, that the main standard operations which are necessary to transform (map) XML messages are selection, aggregation and grouping.
Before going to practical cases and examples, I would like to clarify some differences of these operations by comparing the XML world with the more ‘traditional’ world of relational databases, although this comparison cannot be a good one due to the vast difference between the hierarchical tree structure of XML and the more or less ‘flat earth’ model of RDBS tables (ignoring the many efforts to ‘pump up’ SQL to handle tree structures).
In the database world the well known SQL SELECT statement is used. In XSLT we have XSLT functions and XPATH expressions to access data selectively.
Aggregated data access is done in SQL with the help of the aggregate functions like for example MIN,MAX,AVG,SUM. In XML this can also be done with a mixture of XSLT funtions and XPATH expressions.
Grouping data can be defined as a mapping from a set of tuples to a set of sets of tuples (oops: hierarchy!).
Grouping in relational databases is therefore not separable from aggregation (the inversion of the definition above) because of the fact, that the ‘target data type’ of grouping is always ‘tree’ and not ‘table’ This is the reason, why the GROUP-BY and HAVING operators in SQL are only allowed together with some aggregate operator to guarantee a table result (set of tuples). Grouping is always a two-step operation which first splits a collection of data into not necessarily disjoint subset-trees of data and second order these subset-trees in some manner.
A Simple Example
The Muenchian Method
This grouping method goes back to and is called after Steve Muench, Oracle’s primary representative to the W3C XSL Working Group. It is an efficient method to get the grouping done by using <xsl:keys> to avoid other XSLT constructs like tracing all preceding siblings to get the grouping done, which is, at least for large data, very time-consuming. Let’s have a look at the transformation, which groups the positions of the example order according to their quantities:
First, a key for the MENGE field (order quantity) of the E1EDP01 tag is defined. By using this key, every position can be accessed easily by the order quantity. Secondly the unique list of all different quantities occuring in the sample data is looped thru by the first <xsl:for-each> statement creating a <POSITIONGROUP> tag with the quantity attribute. Finally, in a second loop all position data matching the quantity in the group is collected and wrapped in a <POSITION> tag. This is done by the simple key access in the second <xsl:for-each> statement. The result of this transform looks like this:
Grouping with XSLT 2.0
With XSLT 2.0, grouping is much easier than in 1.0. I just touch the necessary changes to our example XSLT to get the same result with new new grouping features. Who is interested may look at the W3C documentation . The XSLT 2.0 stylesheet looks like this:
Instead of defining a key for the grouping like in the Muenchian method, the grouping can be coded straightforward with the <xsl:for-each-group> command. Inside the loop every group member can be accessed by a second <xsl:for-each> loop using the selection current-group(). As already said the result is the same desired XML just like with the Muenchian Method.
Grouping is an interesting issue and often reoccurring need in XML transformations. With XSLT 2.0 it is directly supported and substitutes the former state-of-the-art Muenchian Method.