Technical Articles
XPath beyond filtering
Introduction
When you think about XPath, the first thing that comes in mind is XML node filtering with predicates[ ], isn’t it? In this blog, I would like to share some of my observations around XPath beyond the traditional and conventional capabilities.
With XPath supported by Saxon Enterprise Edition(EE), CPI is now powered with many new functionalities available in terms of operators, expressions and rich set of functions that could be leveraged across in steps such as Filter, Content Modifier and Write Variable.
Let us first start with introduction to various operators and functions with basic self explanatory examples, limitations and workaround. Later, we will focus on use case with XML document as input.
Building blocks
Arrow operator => alternative to nested parentheses
It applies a function to a sequence that is passed as first argument to the function in chain.
('alpha','beta','gamma') => string-join(',')
Result(String) : alpha,beta,gamma
('alpha','beta','gamma') => reverse() => string-join(',')
Result(String) : gamma,beta,alpha
As CPI returns only first item of the result sequence by default for XPath having Value/Data type as string, we will be explicitly using string-join function when we expect multiple string items in the output.
Simple map operator ! similar to a for expression
Compact notation S!E for evaluates the expression E for every item in the sequence S.
(1 to 3) ! ( . * . ) => string-join(',')
Result(String) : 1,4,9
('a','b','c') ! ('<row>'||.||'</row>') => string-join()
Result(String) : <row>a</row><row>b</row><row>c</row>
You can notice the use of concat operator || that is equivalent to concat function.
for expression to iterate over items
It binds the variable to the sequence and then iterates over each item followed by return expression.
( for
$i in (1,2,3),
$square in function($a){$a * $a}
return
$square($i)
) => string-join(',')
Result(String) : 1,4,9
On similar lines, you could explore interesting functions such as for-each and for-each-pair.
(1,2,3) => for-each( function($a){ $a * $a } ) => string-join(',')
Issue with use of colon (:) in CPI editor Before proceeding ahead, please note that CPI is a bit sensitive to the use of colon that is meant for handling the namespace prefix. In case you happen to use colon in XPath expression, you might come across below error in the editor mostly about the first occurrence of colon. Namespace prefix expression assigned to <step> not defined in Namespace Mapping As a workaround, you can declare the expression in namespace mapping in runtime configuration. Not a clean way, but you can leverage the hack until CPI editor matures further in handling colon in XPath. xmlns:expression=expression |
let expression for variable declaration
It allows declaring variable followed by return statement.
let $x := 'alpha', $y := 'beta' return concat($x, '|', $y)
Result(String) with colon workaround: alpha|beta
You can manage this without the workaround using for..return expression.
for $x in 'alpha', $y in 'beta' return concat($x, '|', $y)
How to use header or property variables as parameter in XPath ? To access header or property variables in XPath expression, you can simply use $ followed by the variable name. Here is the expression you are left with assuming variable x and y is populated from property.
|
map
It is a new datatype that deals with collection of key-value pairs using by a colon ‘:’ .
map { 'A':'alpha','B':'beta','G':'gamma' } ? A
Result(String) with colon workaround : alpha
This is very basic use of map along with lookup operator ?. Actually map and array data model can also be used to represent and process JSON data structure. To further explore the rich set of available map and array functions, please follow standard documentation.
How to use functions from different namespace ? So far, all the functions we used belongs to default function namespace, which allows using the function in shorter notation such as string-join. In case, you wish to use function from default namespace in full notation such as fn:string-join, you will need explicitly declaring the prefix in namespace mapping in runtime configuration separated by semi-colon(;).
If you intent to use functions from other namespace, you must always be using the full notation along with namespace mapping declaration.
|
Use case
Let us consider some extended cases with list of books as source XML payload.
<bookstore>
<book>
<title>t1</title>
<author>X</author>
<price>88</price>
</book>
<book>
<title>t2</title>
<author>Y</author>
<price>22</price>
</book>
<book>
<title>t3</title>
<author>X</author>
<price>33</price>
</book>
</bookstore>
Case 1 : List out the all the authors
( '<authorlist>',
//book/author => serialize(),
'<authorlist>'
) => string-join()
serialize function here returns string representation of XML node sequence. You can use this function for the XML nodes that you wish to pass through as it is.
Here is the equivalent expression without using serialize function.
( '<authorlist>',
//book/author ! ('<author>'||.||'</author>'),
'<authorlist>'
) => string-join()
Result(String) :
<authorlist>
<author>X</author>
<author>Y</author>
<author>X</author>
<authorlist>
For unique set of authors, you can leverage distinct-values function.
( '<authorlist>',
distinct-values(//book/author) ! ('<author>'||.||'</author>'),
'<authorlist>'
) => string-join()
Result(String) :
<authorlist>
<author>X</author>
<author>Y</author>
<authorlist>
Case 2: List out title and price for all the books with title in uppercase and price with appended with $
('<booklist>', //book!
('<book>',
title!('<title>'||upper-case(.)||'</title>'),
price!('<price>'||concat('$',.)||'</price>'),
'</book>'),
'</booklist>'
) => string-join()
Result(String) :
<booklist>
<book>
<title>T1</title>
<price>$88</price>
</book>
<book>
<title>T2</title>
<price>$22</price>
</book>
<book>
<title>T3</title>
<price>$33</price>
</book>
</booklist>
Case 3: List the book titles grouped together by author
( for
$a in //book/author=>distinct-values(),
$t in //book[author = $a]/title => serialize()
return
('<author name="', $a ,'">', $t ,'</author>')
) => string-join()
Result(String) :
<author name="X">
<title>t1</title>
<title>t3</title>
</author>
<author name="Y">
<title>t2</title>
</author>
In all the 3 cases above, we are getting string as the result. In case, you wish to have an XML node-list as result, try parse-xml-fragment function at the end after string-join.
Case 4: Sort the books based on author and price
//book => sort((), function($b){ $b/author, $b/price })
Result(Nodelist) :
<book>
<title>t3</title>
<author>X</author>
<price>33</price>
</book>
<book>
<title>t1</title>
<author>X</author>
<price>88</price>
</book>
<book>
<title>t2</title>
<author>Y</author>
<price>22</price>
</book>
Conclusion
The idea of the blog is to showcase some of the new capabilities available with XPath such as function chaining, variable declaration, inline function declaration and iteration over a sequence along with rich set of built-in function library. In terms of use case, it could be leveraged in CPI for light-weight transformation, sorting and aggregation apart from filtering. You should consider this as a complimentary option and not as a replacement to the other popular mapping options available.
Thanks Sunil for sharing these use cases and examples.
I wonder where/how to use above techniques in CPI iFlows.
Is it still in Groovy script or in XSLT or other type of iFlow steps?
Thanks.
Kind regards,
Nick
Thank you Nick Yang.
You can easily try the examples with Filter component. Additionally you can try using Content Modifier and Write variables with variable type as XPath.
Regards,
Sunil Chandra
Hi Sunil,
Thanks for your hint and sharing and I have tried few from your examples.
It's great to know that we could put these complex XPath in filter step and content modifier.
Kind regards,
Nick
Thanks for sharing Sunil...