In the first part of this weblog I introduced STX and mentioned validation techniques beyond W3C XML schema \ as an application. Now this will be explained in detail. Therefore I used more advanced STX techniques. Now I want to put it all together for an application in data exchange.
We do data exchange to link electronic business process by making data of one system available for another system. Usually we don't want to accept any data - we only accept valid XML documents. Using schema languages like W3C we have several advantages:
But validation against an W3C XML Schema has also disadvantages:
Validation languages like Schematron sometimes perform better because we can code rules and assertions. Unfortunately most Schematron implementations rely on XSLT so that you can't check huge XML documents. In this weblog I will present a self made prototype of an validation language STV (Streaming Validation for XML) that is based on STX, so I expect a good performance. On the other hand compared to Schematron there is a lack of expressiveness. But combined with W3C XML Schema it is an powerful tool.
In the first part of bis blog series I transformed an XML document that containes a list of business partners and contract information. Each contract has a number and a start and an end date. Unfortunately we W3C XML Schema can't check whether the start date is less or equal than the end date and that the list of contrats of each business partner is does not overlap. That means that we wan't to give out error messages if following two cases occur:
and
The latter case is wrong because a missing attribute @ende means that we have an open end. Here is a XML document that contains those errors:
An STV transformation defines a set of rules that consist of assertions. An assertion can be coded with variables that have to be assigned first. Within a rule we can initialize buffers that can be appended and processed. I created a schema for that language, that you can dowload here:
I want to code the checks mentioned above in a formal language:
Let's look how it works. All STV-commands have namespace urn://svx/001, the root element is svx:schema. We define certain rules with the command svx:rules. Each rule has two attributes context and location. The first one defines an element which is evaluated together with its children to process the rule. The attribute location defines the place (i.e. an element which triggers the execution of assertions).
Here is the part of the SVX-transformation that compares start and end date:
The rule contracts is defined within the element status executed when the element nr occurs. At each occurrence of the element begin_end_tmr the value of the attribute @beginn is assigned to a variable start using the command stv:let. The variable end is treated the same way but it is assigned to a default if it is missing. At the element end the assertion is processed and we give out an error message if it does not hold.
If we want to check whether date information of different elements nr overlap we need to introduce buffering techniques. We declare a buffer with:
We append date information to the buffer using following commmands:
We check whether these buffered dates overlap:
If you apply an XSLT 2.0 transformation\ to the schema above we get following STX-Programm that performs the specified checks:
In fact it is longer compared to the short STV schema above. I suggest you analyse this programm to learn how STV works. It is also a chance to improve your STX skills.
We can use STV to code checks that will be performed an a certain XML document. Using an XSLT 2.0 transformation we generate an STX program that performs those checks. In the example above we expect that this document is valiad according to a certain schema otherwise the generated program won't produce correct results.
STV is a self-made tool and I think a lot of things can be done better:
I hope I will have time to work on this. Any help would be appreciated.
But there is one thing to mention: I think STX and XML streaming techniques in general are not much known in the XML-community. The STX communitity is very small a I think any help improving Joost and its counterpart in Perl would be appreciated, too.
Dealing with XML mass data is still a challenge. We have powerful streaming techniques in ABAP (I suggest you to read this publication - the english version is coming soon) and in Java. I hope this weblog series helps you applying these Java techniques.