In this weblog I am going to present some techniques used for XML processing. I start with Simple Transformations basics.
Part 1 – Expressiveness of Simple Transformations
Simple Transformations are a SAP proprietary programming language that is integrated into ABAP by CALL TRANSFORMATION in kernel release 6.40. Its concept differs from other transformation language like JiBX, which is used for XML-Java mapping, or languages suitable both for queries and transformations like Lore, XCerpt, XDuce or even XSLT which is supported in ABAP by CALL TRANSFORMATION, too. By looking at examples of ABAP package SST_DEMO you will understand how ST works soon. In contrast to XSLT Simple Transformations (ST) are fast, memory efficient and symmetric but lack of expressiveness.
Unfortunately I can’t describe the power of ST in a mathematical way because as usual conditions cause most of technical problems and <tt:cond > seems to be difficult to handle. In fact, despite its name, Simple Transformations have very powerful and sometimes complicated commands.
So when I’m asked about expressiveness usually I tell how many times an ABAP or XML node can be accessed, I mention the linear order during processing, lookahead of 1 and so on. If I start to tell about the possibilities to use parameters and breaking symmetry between serilization and deserilization, clever people will start to think how far they can go. A typical question is:
Can I create relational data models from nested structures with an unbounded number of occurrences during deserialization?
This is a natural question when you are doing data exchange with non-SAP systems using standardized XML frameworks and you want to save your data in transparent tables with additional foreign keys. If you are working with small datasets you won’t have problems when you copy nested internal tables into the target data structure in ABAP. But let’s ask the question if we can avoid heavy postprocessing. Of course I would accept a pragmatic solution: simple post processing in ABAP without copying data from one internal table into another would be acceptable.
The answer to the question above is: “No chance. You have to do heavy post processing.” The reason is very simple and has to do with <tt:loop> statement. Let´s go into detail. In the following we consider an XML-document with a simple nested structure:
A typical example for a ST is to transform this XML document into an internal table whose table line contains an component ‘nummer’ and another internal table:
Note that the transformation above is symmetric and can be used in both directions to serialize and deserialize.
If we want to transform the XML stream above to a relational data model first we have to break up the nested structure: the values of <A> elements have to be put into one internal table and then <Z> elements into another one. Afterwards we have to deal with the problem how to link these internal tables. Without ABAP calls from ST we will have problems generating foreign keys we need. But perhaps this could be done using clever mapping methods: If there is a new <A>-element we write an additional information to the <Z> elements, think of ‘+’ resp. ‘-’ in an alternating order each time a new <A> occurs. Later we could calculate foreign keys in a post processing step in ABAP. The result after the transformation would be as follows:
internal table for <A>-elements:
|Counter calculated afterwards||Value of element <name>|
internal table for <Z>-elements:
|Additional mark||Value of element <Z>|
Now we can do two loops in ABAP: at first we increment the counter for our <A>-elements and then introduce a counter for our <Z> elements that is incremented each time our mark changes from ‘+’ to ‘-’ or vice versa. We would yield the following result which is exactly what we are looking for.
internal table for <A>-elements:
|Counter during postprocessing||Value of element <name>|
internal table for <Z>-elements:
|Additional mark after postprocessing||Value of element <Z>|
So let´s start to code. The following transformation does this job without calculating alternating marks. Just have a look at the inner loop: we change the root from ROOT1 to ROOT2:
Then we test it with following quick-hack:
The result is annoying: the internal table t_a contains data of two <A>-elements but t_z only one entry with value “03”.
The explanation is simple: when you want to deal with internal tables you have to use <tt:loop> statements. But everytime the <tt:loop> inner statement is called during deserialization the internal table is cleared so in fact you are loosing information in the example above. At first this example may look strange: just do the transformation back and look at the result: it will differ from the XML-document above! At first glance we have an asymmetric behaviour although we didn’t use any asymmetric commands. At first I thought of it as a bug but switching the root makes sense. Why should ST inventors forbid those mechanisms and reduce expressivenes of this language?
In the following I want to mention a second aspect that seems to be confusing when you are confronted with ST for the first time:
You can assign the value of an element during deserilization only once!
In the rest of this issue we try to solve following task: We want to deserialize the content of the <name>elements into one table and the nested <Z> elements into another if the content of its associated <name> element has value “A1”. Perhaps you would start to define a variable <tt:variable name=”N”/> for this task that stores the value of the current <A> element. I guess your code might look something like the following:
If you run this transformation you won’t get the expected result. Why? Compared to XSLT, variables in ST behave like variables in any other procedural language and you can assign them for more than one time. \ But following two lines don’t work:
First you want to read it into an ABAP structure and then bind it to a variable. Even changing \ the order of the two commands wouldn’t help you. Let’s test it with an easier example:
Here is the ABAP code for running this ST:
You will verify that ROOT2 is empty afterwards, so this doesn’t work. Therefore we have to find another solution to implement our transformation, perhaps using the condition construct. But here we have another problem: the template content of the conditional is either a template (then you can’t do assignments to variables for example) or it is evaluated unconditionally during deserialization. So in fact I can’t tell you whether this will lead to success.
So let’s summarize what we have learned: programming ST is not difficult and ST does a great job transforming an XML document into a nested ABAP structures back and forth because it is designed for this task. So to be honest it’s not surprising that transformations to a relational model is impossible without heavy post processing because for this task we have to create a set of internal tables from an XML tree and have to add foreign keys. In our small example above those keys are not part of our model so we would have to generate them and of course break the symmetry of our transformation thereby. The impossibility is caused by the fact that we can’t increment variables and we can’t append deserialized data to an internal table.
If we want to describe the (lack of) power of Simple Transformations I suggest to collect some simple examples for transformations we can’t realize with ST and reduce other problems to them.
But does the lack of expressiveness really bother you? I don’t believe in solutions that can solve every problem without causing new problems. Perhaps some new features that could help us will be added to ST in post 6.40 development but designers will have to be careful not to make the language too complex.
On the other hand a software developer should try to choose the right technology (iXML, XSLT, XSLT with ABAP calls or Java enhancement, ST or even JAXB) and not to misuse a certain technology to create programs that can solve a problem in an unexpected way but are hard to understand and possibly difficult to maintain.