Skip to Content
Author's profile photo Tobias Trapp

Streaming Techniques for XML Processing – Part 4

A Mapping Strategy for STX: Divide and Conquer

In one of my last weblogs I mentioned that I believe that mapping techniques based on  STX could be a great benefit to data exchange. In this weblog I will present a pragmatic strategy for design of mappings that combine the strengths of XSLT and STX.

Most mapping tools design XSLT programs that transform serialized business objects. Since often mass data consists of a collection of a great number of single business objects of little size, we can use STX to extract the single business objects and apply an XSLT mapping to each one. With this approach the DOM tree for the huge XML document isn’t generated and we can apply XSLT transformations generated by a usual graphical mapping tool.

Extraction and Buffering of XML Elements

In the following program we match bp elements and copy it into a buffer input by processing it with a group that has no templates but mode pass-through="all". Then we pass this buffer to an XSLT transformation and give it out:

XSLT Integration

We pass the content of a buffer to XSLT using the attributes filter-method and \ filter-src. With stx:param\ we can pass parameters to the transformation. In this case we define a counter that is used by the XSLT transformation.

The following XSLT program generates a CSV dataset:

You can run the STX together with the XSLT program using the second example dataset from my first weblog as input.

Please remark that STXPath can’t work on buffers. In fact this is the reason why STX processors are able to implement a light weight memory representation of XML fragments. On the other hand the integrated XSLT program needs to build the DOM tree for every bp element and its children. So our transformation is very memory consuming but we don’t need to build the DOM tree for the whole document at once.


In this weblog I introduced an example for XSLT integration for XML mapping. Since as far as I know there are no graphical tools that generate STX code we reuse XSLT mappings that are applied on parts of the input document.

I think designing tools for schema mappping that produce pure STX is a very challenging task I think I will work on in future. Although STX is Turing complete and can use buffers designing a general mapping tool might be difficult. But on the other thing I thinks that most mappings are quite easy and there could be a chance to find a solution that works for most or at least most common cases.

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member
      Author's profile photo Tobias Trapp
      Tobias Trapp
      Blog Post Author
      Thank you for this hint but I think STX is the better solution for the transformations I have in mind. Please let me explain: In fact you use xsl:copy-of to copy (hopefully small) XML-subtree into a variable and apply a transformation on it - the result is written to the output stream. In fact this the same divide-and-conquer approach we do in STX, we can even use an external XSLT processor to perform this task. But STX can do more:
      *) We can use variables due to the procedual nature of STX (think of counters for example) and
      *) as far as I can see STXPath more powerful then the restricted "Schema-like" XPath you can use with saxon:read-once: There must be no predicates; the first step (but only the first) can be introduced with "//"; the last step can optionally use the attribute axis; all other steps must be simple axis steps using the child axis.

      Can you tell me you Best Practices for this SAXON-feature?

      But nevertheless the scetch of the implementation details seem to be quite clever, let me cite the link:
      "These (XPath) restrictions allow Saxon to use the same code for serial XPath processing that is already used for validating identity constraints against a schema (which is one reason the facility is available only in Saxon-SA)" and "The implementation of this facility typically uses multithreading. One thread (which operates as a push pipeline) is used to read the source document and filter out the nodes selected by the path expression. The nodes are then handed over to the main processing thread, which iterates over the selected nodes using an XPath pull pipeline."