Streaming Techniques for XML Processing – Part 6
STX Best Practices
After my SDN blog about STX was linked from http://stx.sourceforge.net and even from http://www.xml.org for a couple of time I got lots of questions: How can I do this? What is wrong with this program? I decided not to redirect them to the Joost mailing list and tried my best to answer the questions and to find bugs in the transformations. But then I decided to write about most common mistakes down and best practices in this blog.
Grouping and Scope of Templates
If we write programs in ABAP and Java we try to do modularization – in STX we have to use stx:group for this task. XML documents contain hierarchical structures, i.e. nested XML elements corresponding to sometimes complex data types. If we want to write a complex XML transformation it makes sense to define groups for every data type: at first we collect the data we need in variables and buffers and build up the target data structure.
This kind of modularization is very important for maintainable STX programs. Just imagine the source data structure changes and you have to modify your transformation – what is the impact of a change? In my experience compared to XSLT changes in STX programs are very difficult to predict because we can modify variables which is not possible in functional languages like XSLT. Grouping of templates is one possibility to limit the impact of local changes in STX programs. In the following I will tell you how it works.
An STX program applies template rules that will be evaluated if their match attribute corresponds to XML events. Every template belongs to a certain group – if we don’t define a group then all templates are top-level templates and form a single group: the default group. The group containing the current template is referred to as the current group. The base group is either the current group or the group specified by the group attribute of the process statement like stx:process-children, stx:process-self and so on.
If an XML element or attribute is evaluated in the XML input stream an event is raised and the element is evaluated by the corresponding template. The corresponding template is found by applying following rule set defining three precedence categories (listed with decreasing precedence):
- Templates from the base group and public templates (public=”yes”) from groups that are children of the base group
- group templates (visibility=”group”) and global templates (visibility=”global”) from all groups that are ancestors of the base group
- all global templates (visibility=”global”)
In the following I will discuss the differences between the public and visibility attribute.
By default all templates of the default group are public and all other private. So at the beginning of an STX program the templates of the default group will be considered first and then the public templates of the child-groups. So a public template is the entry to a child group. In terms of object orientation these are “public methods” – but if you use the group attribute in your stx:process command then this corresponds to a “friend”-relationship in object orientation: we can access non-public templates.
Sometimes OO-Programmers hestitate to use “friend” relationships between classes (and in fact there are good reasons for it) but I recommend to use group attributes in STX programs. This way of programming is very explicit by setting modifing the base group. In fact you can do the same by defining public templates – and in fact it is a matter of taste what way to choose.
The visibility attribute defines the global visibility of a template in a much wider range. By default all templates have local visibility – so they are only visible within their group or by the parent group if they are public. Group templates (visibility=”group”) can be used from child groups of the current group and global templates (visibility=”global”) are visible within the whole program.
A consequence is that we can use group templates as a kind of “return” statement. Let me give an example: Just imagine that we process a nested structure of HEADER and POSITION elements:
<HEADER> <POSITION>1</POSITION> <POSITION>2</POSITION> <ADDITIONAL_HEADER_INFORMATION/> <HEADER>
Lets suppose that all templates for elements POSITION (and perhaps their child elements) in a group with a template for POSITION as entry. If the element ADDITIONAL_HEADER_INFORMATION occurs we have to leave the group and return to the parent group. If the template for this element has visibility=”group” it will be chosen automatically if this element occurs.
Last but not least let me mention that we have the possibility to use procedure that are equivalent to named templates in XSLT. Here is the syntax:
<stx:procedure visibility = "local"|"group"|"global" public = "yes"|"no" new-scope = "yes"|"no" name = qname> <!-- Content: template --> </stx:procedure>
We call procedures using the following command:
<!-- Category: template --> <stx:call-procedure name = qname group = qname> <!-- Content: stx:with-param* --> <stx:call-procedure>
The optional group attribute allows to use the specified group instead of the current group as a base group for calling the procedure. In my opinion we should use this feature with care.
Scope of Variables and Shadowing
The things mentioned above have impact on the scope of variables. We can define variables in the default group, within groups and within templates.
You can consider variables in group are static variables – there is only one instance that is initialized at the start of the STX program. These variables are visible from templates in their group and (recursively) in child group. As a consequence variables in the default group are global variables and visible everywhere. In my opinion we should handle global variables with care (like in any other programming language) and try to use group variables.
Local variables are variables defined in templates and are initialized at run-time so that you should prefer them if it’s possible.
If we work with group variables we have to take care because of their static character. If we assign the content on an XML element to a group variable within in template the variable keeps the value if this element is optional and the template isn’t executed. So we should take care to (re)initialize variables. We can do this by .
In fact there is the possibility to create “shadow variables”: the optional new-scope attribute of stx:template specifies whether the template creates new instances of group variables, the default value is no. A new set of group variables is created for each instantiated template with new-scope=”yes”. These variables shadow their former values and exist as long as the template is being processed. But we can even define exceptions from this rule: stx:variable has an optional keep-value attribute that specifies whether a new instance of the variable is created by instantiating a template having its new-scope attribute set to yes.
In never used shadowing up to now – in my opinion it’s a perfect way to make your programs so difficult that nobody understands them. But perhaps someone can write best practices for this feature.
Do you want to help the STX Community?
The STX community needs your feedback. We are interested about your experiences. What kind of features do you use? What can be done better?
Up to now there are only two open source STX implementations in Java in Perl. Joost is very stable and if there are errors there will be corrected within short time. But in fact most XSLT processors are highly optimized: if you look at benchmarks in http://yquem.inria.fr/~frisch/xstream/bench.html#results I think we should start to make the implementation faster.
Please remark that up to know Joost supports only parts the specification. Unfortunately the text processing function stx:analyze-text, stx:match, stx:no-match that are really useful for practical experts are not implemented. Do you want to implement it?
I have read first 3 blogs and it is really a good and useful article. Thanks !!!
I have searched for Part 4 and Part 5 of this series and i couldn't able to find one. Could you please let me know, whether you have skipped the Part 4 and Part 5 purposely (May be it is not your favorite number 😉 ) or i have to refine my search further more?
part 4 and 5 are still on SCN and can be found in my blog list:
Streaming Techniques for XML Processing - Part 4: Streaming Techniques for XML Processing - Part 4
Streaming Techniques for XML Processing - Part 5: Streaming Techniques for XML Processing - Part 5