Some time ago, I worked on a project in which our PI development team was tasked with integrating with a 3rd-party system using SOAP Web services. On the surface, this seemed like a straightforward enough requirement. However, once we got our hands on the WSDL file provided by the 3rd-party system, we quickly discovered that we had a fairly major problem on our hands: the source XML looked more like a dump from a relational database than a hierarchical XML document. Indeed, the XSD alone weighed in at over 2 MB of repeating table-like elements wrapped in a generic wrapper element.
The graphic below illustrates what we were challenged with conceptually. As you can see, the source XML structure has a very flat feel to it. Nested underneath a generic wrapper element are a series of repeating child elements representing records from tables in the 3rd-party system's database. Correlations between the child elements are defined using foreign keys (e.g. match an element within a collection whose sub-element/attribute matches some foreign key within the current element in context). Ultimately, this yielded a series of mapping rules which read more like SQL queries than traditional XML path expressions.
Given the overall complexity of the mapping logic, we quickly dismissed the possibility of using graphical mappings. Similarly, expressing some of the logic required in XSLT would have been cryptic at best and unmanageable at worst. So, by default, we elected to go with Java mappings as they provided the most flexibility. Here, we used the JDOM API (http://www.jdom.org) to parse the source document, load the various tables into collections, and then apply the mapping logic from there. After several iterations of development (and a barrage of conflicting/changing requirements), we finally ended up with a working solution. Still, it hardly felt like a victory because we produced a brittle solution which was very difficult to maintain.
Long after the dust had settled on our project, I began looking for more effective ways to meet such requirements should we encounter them again in the future. It was here that I stumbled onto XQuery. In many respects, you can think of XQuery as being the XML-equivalent to SQL. This is to say that we can use XQuery to extract data from XML files in much the same way we use SQL to extract data from relational databases. Of course, XQuery can do much more than just query data from XML files; it is a full-fledged functional language which also supports variables, loops, conditionals, and so on. For an excellent introduction/treatment of the XQuery standard, I highly recommend Priscilla Walmsley's XQuery (O'Reilly, 2007) which can be found here.
Very quickly, I came to find that XQuery had everything I was looking for in this particular scenario. The question was, how to harness that power within PI?
In order to integrate XQuery with PI, we need two things:
There are several XQuery processors on the market that support Java integration, among them the open source Saxon processor which is available for download here. With Saxon, all we have to do is add a couple of JAR files to the PI mapping engine classpath and we're good to go. For this, I highly recommend that the JARs be loaded into a base SWCV which all mapping SWCVs inherit from.
As far as the API goes, we can use the standard XQuery API for Java (XQJ) defined in JSR-225. In some respects, this API allows us to tap into an XQuery processor in the same way that we use the JDBC API to tap into a RDBMS. The code excerpt below gives a brief glimpse of how this works.
try
{
XQDataSource ds = new SaxonXQDataSource();
XQConnection conn = ds.getConnection();
XQExpression exp = conn.createExpression();
XQSequence seq = exp.executeQuery("for $n in 1 to 10 return $n*$n");
int total = 0;
while (seq.next())
total += seq.getInt();
}
catch (XQException xqe)
{
xqe.printStackTrace();
}
Looking over the code excerpt above, you can see some basic similarities between XQJ and JDBC:
Of course, we can accomplish a whole lot more than what is illustrated in the example above. In particular, we have the option of passing variables to the XQuery processor. This includes simple parameters (e.g. parameters from parameterized operation mappings) as well as source XML documents. The code excerpt below demonstrates how we would bind the source message in a PI mapping document to a variable called "d".
@Override
public void transform(TransformationInput in, TransformationOutput out)
throws StreamTransformationException
{
...
exp.bindDocument(new QName("d"), new StreamSource(in.getInputPayload().getInputStream()), null);
...
}
Within the XQuery source, we can then reference this variable as follows:
declare variable $d as document-node() external;
$d//SomeElement/Child
Here, we can effectively use the variable $d in the same way we would use the value returned from the XQuery doc() function. From here, it's XQuery business as usual.
The last item of business is serializing the XQuery results back onto the PI transformation output stream. As it turns out, this is easily accomplished using the writeSequence() method of the javax.xml.xquery.XQResultSequence class (remember an XQResultSequence is like a ResultSet in JDBC). Here, we simply pass the PI output stream (out.getOutputPayload().getOutputStream()) and an optional java.util.Properties instance containing formatting parameters to the XQuery engine. Quick and painless.
Now that you have a feel for the various pieces involved in integrating PI and XQuery, let's take a look at how the various pieces fit together:
Of course, the steps will vary slightly based upon your particular requirements, but the basic setup remains the same.
While XQuery is certainly a powerful language, it is not a panacea that should be used in every PI mapping scenario. Indeed, most of the time, you'll find that the other mapping tools available will perform faster and are better suited to the typical run-of-the-mill PI mapping problems. However, if you find yourself staring down flattened XML files like the one described above, it can really simplify matters from a development perspective. It can also be useful to join documents from multiple sources (think x-ref files), etc.
On the performance side of things, I've found that integration with Saxon performs reasonably well for small to medium-sized documents. For the most part, the overhead resides in the handoff between the XQJ API and Saxon. The sample project provided (available here) uses stream-based parsing to optimize the performance such that you could conceivably scale upward to handle larger documents, but I would caution you to test extensively before rolling this out to production.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
50 | |
5 | |
4 | |
4 | |
4 | |
4 | |
3 | |
3 | |
3 | |
3 |