Why is it that a transformation fails to produce valid JSON? To answer this question, I wrote a little JSON-XML validator.

The Problem

Since my current main business is to develop web applications with Business Server Pages, I thoroughly studied Horst Keller’s series of blog posts on JSON in ABAP. In the meantime, most of the JSON topics are available, well-structured and extensively documented, in the ABAP help.

In a current  large-scope development project, I had the opportunity to apply many of these ideas. Building JSON from ABAP data always was a topic for BSP elements – in the commonly-used “framework” part of the application, when it came to providing UI components like jQuery data tables with the requested data. In most cases, the best solutions where transformations, transforming the data directly from the ABAP side into the desired JSON xstring, which was then passed via response->set_data( ) to the client. The necessary transformations were designed once and then addressed through the BSP-elements “behind the scenes”. With other words: The main part of the ABAP development was still – ABAP development, 🙂   not JavaScript or JSON or whatever.

When I designed my transformations, they did not run from the scratch, of course. They crashed. And what bothered me with these crashes was that it was hard to find out from the dump which error I had made while constructing the JSON-XML target. Of course, the well-formedness of the result is by far not sufficient. There are rules for JSON-XML, and the dump didn’t tell me which rule I had violated. I had to find that out myself by inspecting the result XML.

A Solution with Schematron

So I thought it would be a good idea to have the machine taking over that work for me – with other words: to have a validator for JSON-XML. To see what I am talking about – find here the

rules of JSON-XML,

directly from the ABAP docu. For those who have never seen an example – here is one:

<object>
  <str name="question">Is JSON-XML cool?</str>
   <bool name="answer">true</bool>
</object>

This little XML document should transform into the JSON string

{ "question":"Is JSON-XML cool?", "answer":true }

But is it really valid JSON-XML? Check it out yourself – with my validator. Here it goes:

http://bsp.mits.ch/jsonxml/

I knew that Schematron was a validation language for XML, apt for writing and asserting rules on the structure of an XML document. So I thought this would be the chance to give Schematron a try for this task. I don’t regret the choice. (Alternatively, I could have used XML Schema and working with a schema validator – the approach that I had used in an earlier post for another issue. But, to admit it frankly: I was curious to see the Schematron technology in action. And it works!)

Basically – bringing all its syntax, document structures, and namespaces to a point – a Schematron document is a collection of XPath expressions which are applied successively on a set of specified context nodes of the document. The XPath expressions are designed to evaluate to true.  If they fail, a failure node with a message is generated in the result document.

  • A document may contain arbitrarily many patterns. Patterns can be used to group the validation rules according to the nodes’ different functions.
  • The setting of the context nodes to inspect is called a rule.
  • An XPath expression which is supposed to evaluate to true, is called an assertion. A rule contains arbitrarily many assertions. An assertion usually contains a diagnosis message as inner text, which will be issued if the assertion fails.

For JSON-XML, I saw three different groups of assertions:

  • General assertions, which have to be satisfied by each element of the document.
  • Assertions on simple elements – corresponding to simple datatypes – like <str>, <bool>, and <num>.
  • Assertions on complex elements, having sub-elements (other than <null>): I.e., <array>s and <object>s.

What are the checks in detail?

General Assertions

The general assertions are enclosed by a rule element of type

<rule context="*">
</rule>

This means, in each phase while the XSLT processor steps through the document, the enclosed assertions have to be applied to every element it encounters.

Now – what are the general assertions?

Allowed element names

In a JSON-XML document, only <object>, <member>, <array>, <str>, <num>, <bool> and <null> elements may occur. In XPath 1.0, this assertion gets a bit clumsy:

<assert test="contains('|object|array|str|num|bool|member|null|',concat('|',name(),'|')) ">
  Undefined element '<name/>'
</assert>

In XPath 2.0, I could have written it in a more elegant manner, but as I wanted my validator to be runnable in a browser, I sticked to XPath 1.0.

The only allowed attribute is ‘name’

The name attribute is used to declare object members. There are no other attributes allowed. Here is the XPath expression to check it:

<assert test="count(@*) < 2 and (count(@*) = 0 or @*[name()='name'])">
  The only allowed attribute is 'name'
</assert>

With count(@*) I assert that the currently inspected element has at most one child node of attribute type – namely: either no attributes at all (count(@*)=0), or an attribute with name ‘name’ ( @*[name()=’name’] ).

A member of an object must have a name

In order to express this condition in XPath, I make use of the parent:: axis:

<assert test="not(parent::object) or @name">
  A member of an object must have a name
</assert>

With other words: Either, the current node is not a direct child of an <object>, or, if it is, it is supposed to have a ‘name’ attribute.

The name attribute is only allowed for object members

The opposite is also true: for other than object members, there is no name attribute allowed (logically, the expressions parent::object and @name form an equivalence).

<assert test="parent::object or not(@name)">
  The name attribute is only allowed for object members
</assert>

A member must be direct child of an object

This means no restriction on elements other than <member> – but for those, the restriction that the parent is an <object>:

<assert test="name() != 'member' or parent::object">
  A member must be direct child of an object
</assert>

Assertions on Simple Elements

All assertions on simple elements are contained in this rule:

<rule context="str|bool|num|null">
<rule>

A simple element can only have text content – or <null>

This is the corresponding assertion:

<assert test="count(*)=0 or ( count(null)=1 and not(text()[normalize-space()]) )">
  A '<name/>' element can have only text content or 'null'
</assert>

So if the simple element contains any child elements at all (and not only text nodes), then it must be exactly one child element of name ‘null’, and if, in the presence of <null>, there are text nodes, then they must contain only whitespace: In text()[normalize-space()], text() gives a collection of all text nodes which are children of the current node. In square brackets, I denote the filter condition ‘normalize-space()’ which is applied to the text content and gives ‘ ‘ if and only if the node contains only whitespace. And ‘ ‘ is the only result of normalize-space( ) which in boolean context evaluates to false, to the effect that text()[normalize-space()] selects all the non-whitespace text node children of the current element. Thus, with the expression

not(text()[normalize-space()])

I express the condition that this node set is ’empty’, i.e. the element doesn’t contain any non-whitespace text.

A <bool> element can only have ‘true’ or ‘false’ as text content

Again, we use the XPath 1.0 workaround with the string functions contains() and concat(), whereas in XPath 2.0, we would have a more readable alternative.

<assert test="name()!='bool' or contains('|true|false|',concat('|',text(),'|'))">
  A 'bool' element can only have 'true' or 'false' as text
</assert>

The content of a <num> element must be a number

If a string represents a number, then it should be possible to convert it into a number with XPath’s built-in number() function. The functions gives “NaN” (Not a Number) as return value, if it fails to parse the string into a number:

<assert test="name()!='num' or string(number(.)) != 'NaN'">
  The content of a 'num' element must be a number
</assert>

A <null> element must be empty

… meaning that a <null> element contains neither child elements nor text:

<assert test="name()!='null' or (count(*) = 0 and not(text()) )">
  A 'null' element must be empty
</assert>

Assertions on Composed Elements

Composed elements are <object>’s and <array>’s, which gives this restriction on the context node:

<rule context="object|array">
</rule>

A complex element must not contain text nodes as children

This was the rule that I had violated most frequently: When writing an XSLT transformation for a larger structured object, and having no explicit template for some of its components, the built-in template will be applied – to the result that all the element’s text content will be copied into the target tree fragment. For complex objects, this is fatal.

<assert test="not(text()[normalize-space()])">
  A '<name/>' element must not contain text nodes as children
</assert>

Putting it Together

Putting the assertion into one document, and enriching it with some syntax noise required by the Schematron document structure, I arrive at this

Schematron ruleset

With a desktop tool like Probatron4J, I now could apply the ruleset from the command line to my test document. But the classical approach is to convert the schematron document into an XSLT transformation, and to apply this transformation on the test instance with a common XSLT processor. This is the way I have chosen, using the transformations iso-schematron-xslt1.zip from the Schematron presence on code.google.com. Sticking to XSLT 1.0, I could enable the transformation in the browser, so that I can check the JSON-XML candidate easily via copy/paste.

Maybe you find the resulting validator useful.

To report this post you need to login first.

4 Comments

You must be Logged on to comment or reply to a post.

Leave a Reply