XQuery and PI: A Match Made in Nasty XML

jwood_bowdark · ‎12-18-2012

Some time ago, I worked on a project in which our PI development team was tasked with integrating with a 3rd-party system using SOAP Web services. On the surface, this seemed like a straightforward enough requirement. However, once we got our hands on the WSDL file provided by the 3rd-party system, we quickly discovered that we had a fairly major problem on our hands: the source XML looked more like a dump from a relational database than a hierarchical XML document. Indeed, the XSD alone weighed in at over 2 MB of repeating table-like elements wrapped in a generic wrapper element.

The graphic below illustrates what we were challenged with conceptually. As you can see, the source XML structure has a very flat feel to it. Nested underneath a generic wrapper element are a series of repeating child elements representing records from tables in the 3rd-party system's database. Correlations between the child elements are defined using foreign keys (e.g. match an element within a collection whose sub-element/attribute matches some foreign key within the current element in context). Ultimately, this yielded a series of mapping rules which read more like SQL queries than traditional XML path expressions.

Given the overall complexity of the mapping logic, we quickly dismissed the possibility of using graphical mappings. Similarly, expressing some of the logic required in XSLT would have been cryptic at best and unmanageable at worst. So, by default, we elected to go with Java mappings as they provided the most flexibility. Here, we used the JDOM API (http://www.jdom.org) to parse the source document, load the various tables into collections, and then apply the mapping logic from there. After several iterations of development (and a barrage of conflicting/changing requirements), we finally ended up with a working solution. Still, it hardly felt like a victory because we produced a brittle solution which was very difficult to maintain.

Enter XQuery

Long after the dust had settled on our project, I began looking for more effective ways to meet such requirements should we encounter them again in the future. It was here that I stumbled onto XQuery. In many respects, you can think of XQuery as being the XML-equivalent to SQL. This is to say that we can use XQuery to extract data from XML files in much the same way we use SQL to extract data from relational databases. Of course, XQuery can do much more than just query data from XML files; it is a full-fledged functional language which also supports variables, loops, conditionals, and so on. For an excellent introduction/treatment of the XQuery standard, I highly recommend Priscilla Walmsley's XQuery (O'Reilly, 2007) which can be found here.

Very quickly, I came to find that XQuery had everything I was looking for in this particular scenario. The question was, how to harness that power within PI?

Integrating XQuery with PI

In order to integrate XQuery with PI, we need two things:

An XQuery processor
An API that provides for integration between a PI mapping program written in Java and the XQuery processor

There are several XQuery processors on the market that support Java integration, among them the open source Saxon processor which is available for download here. With Saxon, all we have to do is add a couple of JAR files to the PI mapping engine classpath and we're good to go. For this, I highly recommend that the JARs be loaded into a base SWCV which all mapping SWCVs inherit from.

As far as the API goes, we can use the standard XQuery API for Java (XQJ) defined in JSR-225. In some respects, this API allows us to tap into an XQuery processor in the same way that we use the JDBC API to tap into a RDBMS. The code excerpt below gives a brief glimpse of how this works.

try

{

XQDataSource ds = new SaxonXQDataSource();

XQConnection conn = ds.getConnection();

XQExpression exp = conn.createExpression();

XQSequence seq = exp.executeQuery("for $n in 1 to 10 return $n*$n");

int total = 0;

while (seq.next())

total += seq.getInt();

}

catch (XQException xqe)

{

xqe.printStackTrace();

}

Looking over the code excerpt above, you can see some basic similarities between XQJ and JDBC:

First, we obtain a data source.
Next, we use that data source to establish a connection. This can be a physical connection to an XML database, or a virtual connection to an input stream that will be established later.
Then, we build an expression object which is roughly analogous to a PreparedStatement in JDBC. Here, we have the option of providing the XQuery source up front (e.g. a pre-compiled query of sorts) or at execution time.
Once the expression object is constructed, we can use the executeQuery() method to execute the query.
Finally, we can use the resultant XQSequence object to access the results (à la the ResultSet object from JDBC).

Of course, we can accomplish a whole lot more than what is illustrated in the example above. In particular, we have the option of passing variables to the XQuery processor. This includes simple parameters (e.g. parameters from parameterized operation mappings) as well as source XML documents. The code excerpt below demonstrates how we would bind the source message in a PI mapping document to a variable called "d".

@Override

public void transform(TransformationInput in, TransformationOutput out)

throws StreamTransformationException

{

...

exp.bindDocument(new QName("d"), new StreamSource(in.getInputPayload().getInputStream()), null);

...

}

Within the XQuery source, we can then reference this variable as follows:

declare variable $d as document-node() external;

$d//SomeElement/Child

Here, we can effectively use the variable $d in the same way we would use the value returned from the XQuery doc() function. From here, it's XQuery business as usual.

The last item of business is serializing the XQuery results back onto the PI transformation output stream. As it turns out, this is easily accomplished using the writeSequence() method of the javax.xml.xquery.XQResultSequence class (remember an XQResultSequence is like a ResultSet in JDBC). Here, we simply pass the PI output stream (out.getOutputPayload().getOutputStream()) and an optional java.util.Properties instance containing formatting parameters to the XQuery engine. Quick and painless.

Putting it All Together

Now that you have a feel for the various pieces involved in integrating PI and XQuery, let's take a look at how the various pieces fit together:

First, we must download the requisite XQuery provider JARs and place them on the PI mapping engine classpath. This is most easily achieved by creating a base-level SWCV and uploading the JARs as imported archives.
Next, we need to create the XQuery file that will contain the mapping logic. For this task, I would recommend that you use an external editor such as Stylus Studio or XMLSpy as these tools contain syntax highlighting, built-in test environments, and so on.
Once the XQuery file is created, it will need to be integrated with a PI Java mapping archive file so that it can be loaded at runtime. You can download an example mapping project here to see how this can be achieved. Basically, the .xquery file is placed in a directory within the JAR file and loaded via the getResourceAsStream() method of the java.lang.Class class.
From a performance perspective, it is a good idea to pre-load/pre-compile the XQuery file so that each mapping request does not incur this overhead at runtime. In the aforementioned sample project, I achieve this requirement by pre-loading the query file in a static initializer block.
This just leaves the transform() method with the job of handing over the mapping request to the XQJ API and serializing the results.

Of course, the steps will vary slightly based upon your particular requirements, but the basic setup remains the same.

Recommendations for Use

While XQuery is certainly a powerful language, it is not a panacea that should be used in every PI mapping scenario. Indeed, most of the time, you'll find that the other mapping tools available will perform faster and are better suited to the typical run-of-the-mill PI mapping problems. However, if you find yourself staring down flattened XML files like the one described above, it can really simplify matters from a development perspective. It can also be useful to join documents from multiple sources (think x-ref files), etc.

On the performance side of things, I've found that integration with Saxon performs reasonably well for small to medium-sized documents. For the most part, the overhead resides in the handoff between the XQJ API and Saxon. The sample project provided (available here) uses stream-based parsing to optimize the performance such that you could conceivably scale upward to handle larger documents, but I would caution you to test extensively before rolling this out to production.