SAP Data Intelligence: Insert JSON Data into SAP H...

tim_huse · ‎07-08-2021

Insert JSON Data into SAP HANA or SAP Data Warehouse Cloud (Image Source: SAP)

Introduction

JSON is a widely spread format for exchanging and storing structured data. It enjoys great popularity in the age of microservices, e.g. for exchanging data between servers and clients via RESTful APIs. In this blog post, I'll show you how to persist data in JSON format to an SAP HANA table or an SAP Data Warehouse Cloud table using SAP Data Intelligence.

For this purpose, the "SAP HANA Client" operator in SAP Data Intelligence is used. In addition to other options, such as executing SQL commands and inserting csv. data, it offers the possibility to persist JSON files in SAP HANA and SAP Data Warehouse Cloud. It is crucial that the file has the correct format. You can see examples of the correct implementation of the JSON file in this blog post. The documentation of the operator can be accessed here.

The remainder of this post is organized as follows: First, two suitable JSON formats are presented in which the source data can be transformed to be utilized in the "SAP HANA Client" operator. Next, an example scenario for inserting data with these two formats is illustrated. Finally, a summary is provided.

Prerequisite

Your SAP HANA database or SAP Data Warehouse Cloud tenant must be connected to your SAP Data Intelligence system via the connection management of Data Intelligence. A detailed guide on connecting to SAP HANA can be found here. A comprehensive description of the connection to SAP Data Warehouse Cloud can be found in this blog post by Yulia Reich.

Option 1 – JSON object with keys

To use the "SAP HANA Client" operator in this scenario, the input data is sent as a JSON array to the "data" port of the "SAP HANA Client" operator. In Option 1, each element that is written into the database as a record is a JSON object whose keys match the column names of the target table.

Below, you find an example of the JSON file. The table has the three columns "ID", "COUNTRY" and "CAPITAL":

 [

    {

        "COUNTRY": "France",

        "ID": 1,

        "CAPITAL": "Paris"

    },

    {

        "ID": 2,

        "CAPITAL": "Berlin",

        "COUNTRY": "Germany",

        "SparseKey": "Example"

    }

]

As you can see in this example, the JSON object can also contain other keys that are not supposed to be inserted into the database table. However, none of the columns from the SAP HANA table can be missing within the JSON file. Furthermore, you can see that the keys can be contained in a different order.

Option 2 – JSON array representing a table row

To use the "SAP HANA Client" operator in this scenario, the input data is sent as a JSON array to the "data" port of the "SAP HANA Client" operator. In Option 2, each element that is written into the database as a record is a JSON array whose elements match the column names of the target table in the correct order.

An example JSON could look like this. The table has the three columns "ID", "COUNTRY" and "CAPITAL":

 [

    [

        3,

        "Austria",

        "Vienna"

    ],

    [

        4,

        "Greece",

        "Athens"

    ]

]

If the JSON array contains fewer elements than the number of columns in the target table, "NULL" values are inserted for the remaining fields.

Example:

In the example illustrated here, the JSON files shown in Option 1 and Option 2 are written into a table (table with three columns "ID", "COUNTRY" and "CAPITAL") in SAP Data Warehouse Cloud.

The example graph looks as follows. It consists of (1) a Python 3.6 operator ("Send JSON data") which sends the JSON file, (2) an "SAP HANA Client" operator which is used for persistence and (3) a "Graph Terminator" operator which is used to terminate the pipeline.

Example Graph (Image Source: Own Image)

The picture below shows how the "SAP HANA Client" operator must be configured. In the "Connection" configuration, the connection to the SAP Data Warehouse Cloud is set up. In the "Table Name" configuration, the table in which the data from the JSON file is to be inserted is specified (this table can also be created at runtime by the operator, if the configuration of "Table Initialization" is set to "Create"). In the "Table Columns" configuration, the table columns are defined (see also the following figure with the columns of the example table). The configuration of the field "Input Format" is especially important. Here, "JSON" must be selected for the scenario to work.

Configuration SAP HANA Client (Image Source: Own Image)

Example of Option 1:

Below is the code of the "Send JSON data" operator, which can be used to send the JSON file from Option 1 to the "SAP HANA Client" operator.

import json



attributes = {}

example_json = [{"COUNTRY": "France", "ID": 1, "CAPITAL": "Paris"}, {"ID": 2, "CAPITAL": "Berlin", "COUNTRY": "Germany", "SparseKey": "Example"}]



body = json.dumps(example_json)



api.send("output", api.Message(body, attributes))

After successfully running the graph, the table can be viewed in SAP Data Warehouse Cloud.

Result in SAP Data Warehouse Cloud (Image Source: Own Image)

Et voilà! The table was created and the two records were successfully inserted into the table.

Example of Option 2:

Below is the code of the "Send JSON data" operator, which can be used to send the JSON file from option 2 to the "SAP HANA Client" operator.

import json



attributes = {}

example_json = [[3, "Austria", "Vienna"], [4, "Greece", "Athens"]]



body = json.dumps(example_json)



api.send("output", api.Message(body, attributes))

After successfully executing the graph, the table can be viewed in SAP Data Warehouse Cloud.

Result in SAP Data Warehouse Cloud (Image Source: Own Image)

Great! The two data records have been successfully persisted.

As another example, the following JSON in the format from option 2 is sent to the "SAP HANA Client" operator via the "Send JSON data" operator.

[[5, "China"]]

Result in SAP Data Warehouse Cloud (Image Source: Own Image)

Since the JSON array contained only 2 values, a "NULL" value was inserted in the "CAPITAL" column.

Another reference to the operator documentation to keep in mind: "When using JSON, each column value will be treated in its JSON-native type, unconverted." If the data type of the value within your JSON file differs from the data type of the target table column in SAP HANA/ SAP Data Warehouse Cloud, your pipeline will crash (for both, Option 1 and Option 2).

Conclusion

In this blog post I have illustrated two ways how the JSON format of files must be transformed so that the "SAP HANA Client" operator in SAP Data Intelligence can persist the data in SAP HANA tables or Data Warehouse Cloud tables. If the JSON file has a different format, it must first be reformatted, for example, a file in Python can be flattened using the "json_normalize" method of the "pandas" library.

The use cases presented are particularly suitable for simple scenarios. For more sophisticated requirements, where this option is not sufficient, the use of custom operators in SAP Data Intelligence can be helpful. In Python, for example, the hdbcli client can be used to persist data in an SAP HANA table or in the SAP Data Warehouse Cloud.

Thanks for reading! I hope you find this post helpful. For any questions or feedback just leave a comment below this post.

Best wishes,

Tim

Find more information and related blog posts on the topic page for SAP Data Intelligence.

If you have questions about SAP Data Intelligence you can submit them in the Q&A area for SAP Data Intelligence in the SAP Community.