Streaming Techniques for XML Processing - Part 4

ttrapp · ‎07-16-2006

A Mapping Strategy for STX: Divide and Conquer

In one of my last weblogs I mentioned that I believe that mapping techniques based on STX could be a great benefit to data exchange. In this weblog I will present a pragmatic strategy for design of mappings that combine the strengths of XSLT and STX.

Most mapping tools design XSLT programs that transform serialized business objects. Since often mass data consists of a collection of a great number of single business objects of little size, we can use STX to extract the single business objects and apply an XSLT mapping to each one. With this approach the DOM tree for the huge XML document isn't generated and we can apply XSLT transformations generated by a usual graphical mapping tool.

Extraction and Buffering of XML Elements

In the following program we match bp elements and copy it into a buffer input by processing it with a group that has no templates but mode pass-through="all". Then we pass this buffer to an XSLT transformation and give it out:

\ <stx:transform xmlns:stx="<a class="jive-link-external-small" href="<a href="http://stx.sourceforge.net/2002/ns" target="_blank">http://stx.sourceforge.net/2002/ns</a>"><a href="http://stx.sourceforge.net/2002/ns" target="_blank">http://stx.sourceforge.net/2002/ns</a></a>" \ version="1.0">\ \ <stx:variable name="counter" select="1"/>\ \ <stx:template match="bp">\ <stx:buffer name="input"/> \ <stx:result-buffer name="input">\ <stx:process-self group="copy"/>\ </stx:result-buffer>\ \ <stx:result-document href = "./out.xml" \ output-method = "xml" append="yes">\ <stx:process-buffer name="input" \ filter-method="<a class="jive-link-external-small" href="<a href="http://www.w3.org/1999/XSL/Transform" target="_blank">http://www.w3.org/1999/XSL/Transform</a>"><a href="http://www.w3.org/1999/XSL/Transform" target="_blank">http://www.w3.org/1999/XSL/Transform</a></a>" \ filter-src="url('./bp.xsl')" >\ <stx:with-param name="id" select="$counter" /> \ </stx:process-buffer>\ </stx:result-document>\ \ <stx:assign name="counter" select="$counter + 1" />\ </stx:template>\ \ <stx:group name="copy" pass-through="all"/>\ \ </stx:transform>\

XSLT Integration

We pass the content of a buffer to XSLT using the attributes filter-method and \ filter-src. With stx:param\ we can pass parameters to the transformation. In this case we define a counter that is used by the XSLT transformation.

The following XSLT program generates a CSV dataset:

\ <?xml version="1.0" encoding="UTF-8"?>\ <xsl:transform version="1.0" xmlns:xsl="<a class="jive-link-external-small" href="<a href="http://www.w3.org/1999/XSL/Transform" target="_blank">http://www.w3.org/1999/XSL/Transform</a>"><a href="http://www.w3.org/1999/XSL/Transform" target="_blank">http://www.w3.org/1999/XSL/Transform</a></a>">\ <xsl:param name="id"/>\ <xsl:variable name="semikolon" select="';'"/>\ <xsl:variable name="CRLF">\ <xsl:text>\ </xsl:text>\ </xsl:variable>\ <xsl:template match="bp">\ <xsl:value-of select="$id"/>\ <xsl:value-of select="$semikolon"/>\ <xsl:value-of select="personalien/administrative_gender_cd/@V"/>\ <xsl:value-of select="$semikolon"/>\ <xsl:value-of select="personalien/person/person_name/GIV/@V"/>\ <xsl:value-of select="$semikolon"/>\ <xsl:value-of select="personalien/person/person_name/GIV/@V"/>\ <xsl:value-of select="$semikolon"/>\ <xsl:value-of select="personalien/birth_dttm/@V"/>\ <xsl:value-of select="$semikolon"/>\ <xsl:value-of select="addr/ZIP/@V"/>\ <xsl:value-of select="$semikolon"/>\ <xsl:value-of select="addr/CTY/@V"/>\ <xsl:value-of select="$semikolon"/>\ <xsl:value-of select="addr/STR/@V"/>\ <xsl:value-of select="$CRLF"/>\ <xsl:for-each select="status/contract">\ <xsl:value-of select="$id"/>\ <xsl:value-of select="$semikolon"/>\ <xsl:value-of select="position()"/>\ <xsl:value-of select="$semikolon"/>\ <xsl:value-of select="nr/@V"/>\ <xsl:value-of select="begin_end_tmr/@beginn"/>\ <xsl:choose>\ <xsl:when test="test=begin_end_tmr/@ende"/>\ <xsl:otherwise>\ <xsl:value-of select="9999-12-31"/>\ </xsl:otherwise>\ </xsl:choose>\ <xsl:value-of select="$CRLF"/>\ </xsl:for-each>\ </xsl:template>\ </xsl:transform>\

You can run the STX together with the XSLT program using the second example dataset from my first weblog as input.

Please remark that STXPath can't work on buffers. In fact this is the reason why STX processors are able to implement a light weight memory representation of XML fragments. On the other hand the integrated XSLT program needs to build the DOM tree for every bp element and its children. So our transformation is very memory consuming but we don't need to build the DOM tree for the whole document at once.

Summary

In this weblog I introduced an example for XSLT integration for XML mapping. Since as far as I know there are no graphical tools that generate STX code we reuse XSLT mappings that are applied on parts of the input document.

I think designing tools for schema mappping that produce pure STX is a very challenging task I think I will work on in future. Although STX is Turing complete and can use buffers designing a general mapping tool might be difficult. But on the other thing I thinks that most mappings are quite easy and there could be a chance to find a solution that works for most or at least most common cases.