Technical Articles
An Open Source ABAP JSON Library – ZCL_MDP_JSON_*
Hi ABAP developers,
Update 2021-07-11: After 5 years, to my surprise, this library is still useful for some edge cases. I couldn’t support fixing regex bugs(Because they are impossible to fix without introducing new bugs). Now there is a new library with the same methods, which doesn’t use regex for parsing JSON. It is just a bit slower but doesn’t contain parsing bugs. You can find it here: https://github.com/fatihpense/abap-tasty-json
I would like to introduce a new open-source ABAP JSON library we have developed. Why the world needs a new JSON library? I will explain our rationale behind developing this library with its features. In the end it is about having more choices and knowing trade-offs. I would like to thank MDP IT Consulting for letting this work become open-source with the MIT License.
Table of Contents:
- Summary
- Alternatives
- Reasoning and features
- Examples
- Performance
- Links
- Warning
- Conclusion
Summary
You can generate any custom JSON with this library unlike alternatives. Thus you can easily achieve API compatibility with another JSON server written in another language. Beside providing a serializer and a deserializer, this library defines an intermediate JSON class in ABAP. Further development may enable more JSON utilities based on this JSON representation.
Alternatives
CL_TREX_JSON_*
Standard transformation using JSON-XML: https://scn.sap.com/community/abap/blog/2013/01/07/abap-and-json
Manual string manipulation: While it provides flexibility, it is tedious and error prone work. Sometimes it is used together with CL_TREX_JSON_*
These libraries also seek automatic mapping:
https://github.com/se38/zJSON/wiki/Usage-zJSON
https://wiki.scn.sap.com/wiki/display/Snippets/One+more+ABAP+to+JSON+Serializer+and+Deserializer
Reasoning and features
It is intriguing to me that there was no JSON node representation in ABAP. Let me give examples from other languages:
Working with JSON in dynamic or loosely typed languages is easier since easily modifiable representations for JSON object and array already exists in the standard language:
- Python: dict and list does the job: https://docs.python.org/2/library/json.html#json-to-py-table
- Ruby: hash and array.
- PHP: associative array and array.
- Javascript is just lucky in this context (JavaScript Object Notation).
In strongly typed languages like ABAP, Java, Go there are two approaches:
- Intermediary objects
- Java: org.json.jsonobject comes to mind.
- Any Class or Type with annotations that define object and variable names in JSON.
- Java: com.google.code.gson annotation @SerializedName http://static.javadoc.io/com.google.code.gson/gson/2.7/com/google/gson/annotations/package-summary.html
- Go: field tag https://golang.org/pkg/encoding/json/#Marshal
Our library has chosen the intermediary representation approach defining the class ZCL_MDP_JSON_NODE.
Features:
- It provides flexibility down to JSON spec. This is important because you get the same flexibility as manual string manipulation without errors. So compatibility of your ABAP service or client with any other JSON API becomes possible without string manipulation.
- You can deserialize any JSON string.
- You can know exactly what deserializer will produce when you see a JSON string.
- You don’t need to define intermediary data types just for JSON input/output.
Future ideas for development:
- Intermediary ZCL_MDP_JSON_NODE class enables development of methods like JSON equality checker, beautification of JSON output, checks for spec validity for string and number values.
- The library uses regexes for parsing. Most of the time regex can be a quick solution. However, I think finite-state machines are better suited for parsers in general.
- We will work on this library based on our needs and your suggestions. For example, we can work towards 100% compliance with the JSON specification running edge case tests.
Examples
Examples here are in the shortest form to show how easy JSON manipulation can become. There will be more examples at the project repo using other features of the class. JSON node class is easy to understand if you study attributes and methods once.
Deserialization Example:
DATA: l_json_string TYPE STRING.
CONCATENATE
‘{‘
‘ “books”: [‘
‘ {‘
‘ “title_original”: “Kürk Mantolu Madonna”,’
‘ “title_english”: “Madonna in a Fur Coat”,’
‘ “author”: “Sabahattin Ali”,’
‘ “quote_english”: “It is, perhaps, easier to dismiss a man whose face gives no indication of an inner life. And what a pity that is: a dash of curiosity is all it takes to stumble upon treasures we never expected.”,’
‘ “original_language”: “tr”‘
‘ },’
‘ {‘
‘ “title_original”: “Записки из подполья”,’
‘ “title_english”: “Notes from Underground”,’
‘ “author”: “Fyodor Dostoyevsky”,’
‘ “quote_english”: “I am alone, I thought, and they are everybody.”,’
‘ “original_language”: “ru”‘
‘ },’
‘ {‘
‘ “title_original”: “Die Leiden des jungen Werthers”,’
‘ “title_english”: “The Sorrows of Young Werther”,’
‘ “author”: “Johann Wolfgang von Goethe”,’
‘ “quote_english”: “The human race is a monotonous affair. Most people spend the greatest part of their time working in order to live, and what little freedom remains so fills them with fear that they seek out any and every means to be rid of it.”,’
‘ “original_language”: “de”‘
‘ },’
‘ {‘
‘ “title_original”: “The Call of the Wild”,’
‘ “title_english”: “The Call of the Wild”,’
‘ “author”: “Jack London”,’
‘ “quote_english”: “A man with a club is a law-maker, a man to be obeyed, but not necessarily conciliated.”,’
‘ “original_language”: “en”‘
‘ }’
‘ ]’
‘}’
INTO l_json_string
SEPARATED BY cl_abap_char_utilities=>cr_lf .
DATA: l_json_root_object TYPE REF TO zcl_mdp_json_node.
l_json_root_object = zcl_mdp_json_node=>deserialize( json = l_json_string ).
DATA: l_string TYPE STRING.
l_string = l_json_root_object->object_get_child_node( KEY = ‘books’
)->array_get_child_node( INDEX = 1
)->object_get_child_node( KEY = ‘quote_english’ )->VALUE.
START-OF-SELECTION.
WRITE: ‘Quote from the first book: ‘, l_string .
Serialization Example:
DATA: l_string_1 TYPE STRING.
DATA: l_root_object_node TYPE REF TO zcl_mdp_json_node
,l_books_array_node TYPE REF TO zcl_mdp_json_node
,l_book_object_node TYPE REF TO zcl_mdp_json_node
,l_book_attr_string_node TYPE REF TO zcl_mdp_json_node .
*Create root object
l_root_object_node = zcl_mdp_json_node=>create_object_node( ).
*Create books array
l_books_array_node = zcl_mdp_json_node=>create_array_node( ).
*add books array to root object with key “books”
l_root_object_node->object_add_child_node( child_key = ‘books’ child_node = l_books_array_node ).
*You would probably want to do this in a loop.
*Create book object node
l_book_object_node = zcl_mdp_json_node=>create_object_node( ).
*Add book object to books array
l_books_array_node->array_add_child_node( l_book_object_node ).
l_book_attr_string_node = zcl_mdp_json_node=>create_string_node( ).
l_book_attr_string_node->VALUE = ‘Kürk Mantolu Madonna’.
*Add string to book object with key “title_original”
l_book_object_node->object_add_child_node( child_key = ‘title_original’ child_node = l_book_attr_string_node ).
l_string_1 = l_root_object_node->serialize( ).
*ALTERNATIVE:
DATA: l_string_2 TYPE STRING.
*DATA: l_root_object_node_2 type zcl_mdp_json_node.
*Create same JSON object with one dot(.) and without data definitions using chaining.
l_string_2 = zcl_mdp_json_node=>create_object_node(
)->object_add_child_node( child_key = ‘books’ child_node = zcl_mdp_json_node=>create_array_node(
)->array_add_child_node( child_node = zcl_mdp_json_node=>create_object_node(
)->object_add_child_node( child_key = ‘title_original’ child_node = zcl_mdp_json_node=>create_string_node(
)->string_set_value( VALUE = ‘Kürk Mantolu Madonna’ )
)
)
)->serialize( ).
START-OF-SELECTION.
WRITE: / ‘string 1: ‘ , l_string_1.
WRITE: / ‘string 2: ‘ , l_string_2.
Challenge: Try doing these examples with CL_TREX_JSON_*
For more examples please visit GitHub repo.
Performance
On a test machine, using JSON string example above(l_json_string) deserializing and serializing again 10000 times takes 2.1 seconds on average. It shouldn’t have any performance problems with general usage. Complete benchmark code will be on the project repo.
DO 10000 TIMES.
zcl_mdp_json_deserializer=>deserialize(
EXPORTING json = l_json_string
IMPORTING node = l_jsonnode ).
zcl_mdp_json_serializer=>serialize(
EXPORTING node = l_jsonnode
IMPORTING json = l_json_string ).
ENDDO.
Links
Here is a presentation about this JSON library:
Medepia ABAP JSON Library ZCL_MDP_JSON
Project code repository:
GitHub – fatihpense/zcl_mdp_json: MDP ABAP JSON library that can generate and parse any JSON string.
Warning
The library isn’t extensively battle tested as of now. Testing your use case before using it in production is strongly advised. Please report if you encounter any bugs.
Conclusion
If you are just exposing a table as JSON without much modification, it is easier and probably better to use CL_TREX_JSON_*
If you are developing an extensive application and if you want to design your API beautifully, this library is a pleasant option for you.
Thanks for reading.
Best wishes,
Fatih Pense
Is there any advantage compared to using direct calls to the SXML library? (in JSON mode) cf ABAP and JSON in ABAP documentation
By the way, CL_TREX_JSON* is not supported by SAP (cf https://service.sap.com/sap/support/notes/2141584).
Thank you for the SAP note. So, CL_TREX_JSON* classes are intended for internal use only.
As for the differences between this library and SAP transformation solution. Features of SAP transformation for JSON, and my personal thoughts are below:
In that solution they have chosen XML as an intermediary object. We have chosen an ABAP class. I think this class is better suited for complex and custom JSON scenarios.
Best wishes,
Fatih
Thank you for this detailed feedback. For point 2, maybe Horst Keller can give us insights whether the JSON is based on the XML engine. Anyway, I doubt that an ABAP (interpreted byte code) solution is faster than a C (compiled) program, but you're right that's just a guess, this requires a performance comparison.
As described in ABAP News for Release 7.40 - ABAP and JSON | SCN and in the documentation ABAP and JSON, the native support of JSON in ABAP is indeed based on the XML engine used by the sXML-library, where JSON-XML serves as an intermediate format. The idea was reuse of an existing performant and robust infrastructure instead of creating another one besides.
That in fact means, that you cannot deal with JSON alone, but that for parsing and rendering you have to know the JSON-XML-fomat. For serializing and deserializing ABAP data you have to know the asJSON format (that can be mapped to asJson-XML).
Regarding functionality I'd say, that JSON-writers and JSON-readers based on the sXML-library should allow you to perform all rendering and parsing tasks that you need.
Regarding performance one should really test. One could take the examples given above, transfer them to standard parsing and rendering and compare the runtime.
Since the sXML-Library is implemented by kernel modules, I tend to agree with Sandra Rossi, that the natve support should be faster than an ABAP only implementation. The question is, if ZCL_MDP_JSON_NODE is doing it from scratch or if it is a wrapper für ABAP's native JSON support, that simply hides the JSON-XML-format.
Thanks Horst. When I posted, I didn't imagine that sXML JSON needed such "tricks" (JSON-XML) for generating JSON from scratch. ZCL_MDP_JSON_NODE is doing from scratch (string concatenation). I did a little performance comparison, I get on my computer a ratio of 1 for JSON-sXML against 1.7 for ZCL_MDP_JSON*. Test code at https://wiki.scn.sap.com/wiki/display/Snippets/Performance+of+JSON+from+scratch+-+sXML+versus+ZCL_MDP_JSON
Horst Keller Thank you for clarification. I think utilizing an existing infrastructure is a good decision both from technical and from business perspectives.
Sandra Rossi Thank you for benchmark. I was going to write it. Another result of the benchmark: You are faster than me 😀
So in the light of new information, the advantages of this library can be:
Sandra, you have used the library for benchmark. I have learned a few nice tricks from your code. If you have any suggestions for improvement regarding ease of use, methods etc. I will be glad to listen.
In the future, I want to try finite-state machine parser instead of regex. Also maybe I will try adding optional JSON-sXML serializer/renderer backend (maybe I will even write my own kernel module for fun 😳 ).
Thank you both for your interest and fruitful conversation.
Best wishes,
Fatih
Hi Faith,
thanks for contributing to this blog community and sharing your JSON library.
A while ago I tried out all existing JSON libraries (all the ones that you mention above) available and encountered that most of them perform relatively slow when we deal with large volumes of data. I suspect this is due to the internal string handling (parsing and concatenating strings) in the ABAP layer. After some research I found that using the simple transformations (in particular I refer to identity transformations) the performance improved tremendously even with large data volumes. I believe this is the case because such transformations are executed inside the XML engine.
This is true, because the transformation is part of the Kernel (C/C++ is always faster than ABAP 😉 ).
The addiditional libraries are only necessary if you have special needs (like lower case names, special date formats etc.)
Hi Uwe,
thanks for the clarification.
I always wondered why using transformations we have to stick to upper case JSON keys. What is the cause of this restriction?
Thanks
Marco
I guess that's only an arbitrary choice, but I think it's best to say either everything lower case, or everything upper case, and not allowing exceptions ("except in deserialization any case is allowed"), especially when it's about writing transformations for JSON, especially for Simple Transformations which are symmetrical.
See CALL TRANSFORMATION, where the facts are described:
Serialization
Deserialization
We had an discussion about that already in ABAP and JSON. Rüdiger Plantiko
pointed it out: How should ABAP know how to convert upper case to mixed? lower? case.
Therefore, use your own transformations to adjust your data.
Hi Marco,
Thank you for your interest and support. I also think that slowness is caused by string operations. I searched for alternative options, but ABAP has a limited set of functions on this subject. If there was an string or byte stream interface between kernel and ABAP, any ABAP-only solution could be as nearly fast as C/C++ solution. (Interpreted languages can also be very fast!) Slowness increasing with size also suggests that underlying operations for string concatenation and substring is not suitable for this kind of algorithm.
Performance by size can be an interesting benchmark subject. I wonder at which size(kb,length) performance of ABAP-only libraries starts to become unacceptable.
On a side note, I learned that kernel modules are for internal usage. Even for research getting the source is difficult, otherwise I was eager to implement a module 😏
Finally I think in many scenarios JSON data is not that big, if you are using service as an API for user interface(citation needed 😳 ). And if you want flexibility this library provides another option with nice manipulation methods.
If you have any questions about this library, I will be glad to answer.
Best wishes,
Fatih
Hi Fatih,
I've added your project to the ABAP Open Source projects list. (you need a fancy name for your project 😆 )
Hi Uwe, do you have a suggestion for name? What about "ABAP Flexibel JSON"? 😳
AFJ, yes, sounds good. But wait a week or so. Maybe one day you wake up in the night with the new name in mind 🙂
Hi Fatih,
this is the second time I used your library in non-performance critical/non-mass processing objects. I can say, this is the most convenient library to use so far for my requirements. I do not need to fight with the pain, and I can realize injections really easily within 2 or 3 lines of coding. I can easiily access nodes where I already obtained a reference, and nested loops not needed. To be able to manage the JSON I need to use only node keys existing in the JSON documentat in real. Many thanks for your great job again.
Attila
Hi Attila, I'm glad this library made your work easier! I think you are perfectly in the audience I had in my mind when I was writing the library. Maybe I should use your comment as "user testimonial" 🙂
Thanks for your kind words, support & bug reports!
Best wishes,
Fatih
Hi Fatih,
I think in the "manuel mapping" part, we can use the class cl_trex_json*, the model will be
change, internal data object -->JSON--->JSON_NODE--->JSON.
Hi Jack,
I think cl_trex_json* directly converts to JSON string. If you accept the overhead of encoding/decoding two times, it can be used as a clever technique. In that case, zcl_mdp_json* will only give you the ability to edit your JSON structure in a flexible way.
I have used the term "manual mapping" because with zcl_mdp_json* you have to decide which field in JSON string represents which field in your data object. There are alternative libraries that try to automatize mapping. However, I believe mandatory automatic mapping takes away expressiveness&power in the current solutions. There is a trade-off and you can use different libraries for different tasks. I haven't found an elegant way to connect the parsing and automatic mapping yet. I'm open to suggestions.
Also, cl_trex_json* is not supported by SAP as Sandra noted in the comment above. So it is for internal use and it shouldn't be a base for future work.
Thanks for your comment,
Fatih
Hello Fatih,
I have implemented the solution that you have developed and it works fine. However, when I have a variable with value null (null rather than “”), it gives a short dump as it jumps to the next “. The problem is situated in method deserialize_node of class zcl_mdp_json_deserializer where statement
‘FIND REGEX ‘\{|\[|”|\d|t|f’ IN SECTION OFFSET l_offset OF json’ takes us to the next variable of the JSON string and ignores the null value. I have solved this by looking for ‘n’ in parallel. If the offset of the new regex is less than the original regex, I continue with the new offset.
Here you can find an extract of the JSON file I have used.
Here are the code lines I have added to solve the issue
Hope this is helps you to (further) improve this nice piece of work
Kind regards
Yves Kemps
Hi Yves,
Thank you for your report and nice words!
I have opened an issue here and I will add your fix: https://github.com/fatihpense/zcl_mdp_json/issues
(I also edited your comment as a mod, because it was containing CSS, and used code sample feature instead.)
Kind regards,
Fatih
Hello Fatih,
I’m trying the classes and both pending bugs aside (which are documented in github and I solved manually), I think they are working good, but it seems I found an error on this method (which doesn’t seem to have impact, maybe in performance with large data it could):
Shouldn’t the method insert only to OBJECT_CHILDREN (and not append to ARRAY_CHILDREN, which is what ARRAY_ADD_CHILD_NODE does…)?
Thank you!
EDIT: Just found another bug which generates a dump: Escaped double quotes. This JSON is valid, can be checked online in https://jsonformatter.curiousconcept.com/#
However it causes a dump in DESERIALIZE_OBJECT (deserialization operation).
Also, when trying to serialize a string node which has double quotes, they are not escaped.
In both operations, backslash for escaping double quotes should be considered.
Fatih Pense could you please check this issues and maybe open a bug? (I'm no github user, at least yet)
Thank you again!
Hello Alejandro, thank you for your comment & findings! I will create issues for them so they will be more visible & documented.
EDIT: I have created issue#10 and issue#11
It has been 4 years since I have shared this library so I can write a recap of the situation.
Thanks & best regards,
Fatih
Hello Fatih Pense and thanks again for your reply!
In my case the interest in this library is because A) I’m working at a customer with a really old release (7.0). So there’s no support at all for JSON with neither CALL TRANSFORMATION, nor recent classes such as /UI2/CL_JSON, and B) it’s the only one I’ve found so far which allows me to control case sensitive names in the JSON, both for serializing and deserializing. That is lost with automated mapping.
For serializing I did a quick fix; although it doesn’t consider other characters which could need to be escaped apart from double quotes and backslash, I’m not sure other libraries do either (see https://stackoverflow.com/questions/3020094/how-should-i-escape-strings-in-json):
(EDITED 2020.11.30)
The bigger problem is at deserializing (DESERIALIZE_NODE), since this RegEx interprets an escaped double quote as the string ending:
I have learning RegEx in my pending list, so I’m not sure how to fix it. But I think the logic should be: Consider double quote as the ending, ONLY if:
So:
“This has an escaped double quote: \”” (the string is really “This has an escaped double quote: “”)
“This has an escaped backslash: \\” (the string is really “This has an escaped backslash: \”)
“This has an escaped backslash AND an escaped double quote: \\\”” (the string is really “This has an escaped backslash AND an escaped double quote: \””)
Then the unescaping of the backslash itself should be taken care of.
https://jsonformatter.curiousconcept.com is great for checking this cases out…
Maybe some RegEx expert can help us to fix it!
Thanks again
EDIT: syntax error corrected.
Thank you Sandra, but your proposal throws an error: Regular expression '"(?:\\.|[^"])*)"' is invalid in character position 15. I've checked in program DEMO_REGEX_TOY and throws error as well. Could you please check it?
Thanks!
Thx! Corrected, please check (corrections: start parenthesis to capture the value between quotes was missing, and there was an error with \)
Thanks again Sandra, it seems to be working good now!
I did the following modifications to method DESERIALIZE_NODE (change in RegEx + REPLACE for escaped chars, inverse case of SERIALIZE_NODE above):
Fatih Pense I think this also solves this issue which I didn’t relate before: https://github.com/fatihpense/zcl_mdp_json/issues/9
I’ll let you know if I find something more.
Thanks again to you both!
EDIT: Updated credits on the comments, also updated SERIALIZE_NODE fix part above (older version made replacements on the node instance itself, which cumulated replacements on repeated SERIALIZE calls).
Another minor fix, methods SERIALIZE_OBJECT and SERIALIZE_ARRAY can be safely removed from class ZCL_MDP_JSON_SERIALIZER (they are empty and unused anywhere).
Dear Sandra Rossi and Alejandro Bindi,
Thank you for your interest in the library and for the code discussion.
I thought about these regexes in the past but I figured when I fix something, something else breaks! It can be a case study of why using regex is not the best option for parsing things 🙂 As a quick fix, you can find a regex that works for you, test it, and continue using it.
Recently, I have made some performance tests for reading JSON string character by character. Since everything is stored in the memory it seems pretty fast. I can also trade some performance for correctness. I will give it a try. The only problem is time. In the case of new developments, I will keep you updated.
Best regards,
Fatih Pense
Hello again, Fatih Pense another issue and quick fix for method ZCL_MDP_JSON_DESERIALIZER=>DESERIALIZE (you could document it on github):
On trying to deserialize an empty JSON string, you get the following short dump:
STRING_OFFSET_TOO_LARGE
CX_SY_RANGE_OUT_OF_BOUNDS
Illegal access to a string (offset too large)
I solved it by adding the following correction to raise the same exception used for other errors:
Regards
Hello Alejandro,
Thank you for the report and fix! I have created issue#12 It will be useful for the documentation.
Regards,
Fatih Pense
how deserialize all nodes without index, for example, i need all information from node "quote english" not only one. thank you
Hi Maria, good question. Maybe I should make `array_children` returnable by a public method.
I will continue new development in another repo, because regex was causing too many bugs: https://github.com/fatihpense/abap-tasty-json
I created an issue for this use case:
https://github.com/fatihpense/abap-tasty-json/issues/1
Regards,
Fatih
Hello Again Fatih Pense , Sandra Rossi and all, after a while I have found two other bugs in the deserializer coding, for one of them I have a quick fix, but the other one involves a REGEX as before so I'm not finding the solution so easily...maybe you can lend a hand.
After using this REGEX:
It seems, the decimal separator is not contemplated in the possible values. Been trying a while but I'm struggling to fix it.
So my quick fix is to do this simple replace on the whole JSON string before calling deserialize:
...but probably there's a better solution, and also doesn't contemplate escaped double quotes which would be the case for a text containing this sequence.
Would appreciate any help.
Thank you!
You should better report the issue to the github project. I'd say that the regex
is strange because of ^: which I think should be ^" (just a feeling) :
Hello Sandra, I'm not a github user yet and since we were already discussing all this kind of bugs in here, I thought it would be better to continue like this, then Fatih can upload it to github as before.
I've been trying to solve this on my own, took a Regex tutorial and reviewed the class code, still far from even rookie on the subject, but I now think actually the problem is again in DESERIALIZE_NODE method. When acording to the first character, a number type node is determined ("WHEN OTHERS. FIND REGEX '\d+' IN SECTION OFFSET l_offset OF json", near the end):
I think this FIND REGEX '\d+' should be replaced considering number nodes can have dot as decimal separator (as in my case), also an optional negative sign in front, and an "e" for exponential notation, as this link for example states: https://json-schema.org/understanding-json-schema/reference/numeric.html
Been trying to build something like '-?(\d+\.\d+|\d+)' but I'm not sure it's ok...and also the e would be missing...
Comments or corrections appreciated...
Thank you!
https://stackoverflow.com/questions/13340717/json-numbers-regular-expression
Hello Sandra Rossi,
Thanks for responding in the comments for years and being a valuable contributor to the community.
I have updated the library under another name, the new version doesn't use regex. So, at least bugs are solvable in case they occur. I also plan to play with ABAP unit testing.
If you wonder how it is done without regex, here is the deserializer class: https://github.com/fatihpense/abap-tasty-json/blob/main/src/zcl_tasty_json_deserializer.clas.abap
Hopefully, we won't have to solve regex problems and see more interesting issues to tinker with.
I might publish a blog post starting with the phrase "Use the standard if you can!"
I'm open to any advice/ideas.
Best regards,
Fatih
Just wanted to drop in a comment to say that I absolutely love this library.. it makes traversing a json string so logical and so much easier!!! Thanks!