The Small JSON Document Store Cookbook

mkemeter · ‎08-04-2021

Have you ever worked with native JSON documents in SAP HANA Cloud? If the answer is 'No', you can find some resources to get started here:

Official JSON Document Store Guide (the single source of truth!)

Getting Started Blog by laura.nevin

However, if you have worked with JSON documents before, you may have stumbled upon a few questions, which I will try to cover in this blog entry. So, consider this an FAQ, cookbook, cheatsheet or simply a collection of best practices. In any case, I will not try to explain you the (JSON) world, but will try to focus on specific aspects.

Let me know if you found this helpful or have suggestions for enhancements!

Interacting with Collections

Instead of storing data in tables, HANA's Document Store uses so-called collections. Collections support similar CRUD operations as tables.

-- creating a collection

CREATE COLLECTION Customers;



-- insert record using the Object Expression interface

INSERT INTO Customers VALUES(

    {

        "name": 'Paul',

        "address": {

            "street": 'Hauptstraße 10',

            "city": 'Heidelberg'

        }

    });



-- insert record using the string interface

-- use this if you have a JSON string already in your application; notice the different quotes

INSERT INTO Customers VALUES('

    {

        "name": "Paul",

        "address": {

            "street": "Hauptstraße 10",

            "city": "Heidelberg"

        }

    }');



-- Query with filter

SELECT * FROM Customers WHERE "address"."city" = 'Heidelberg';



-- Query with projection and aggregation

SELECT "address"."city", COUNT(*) FROM Customers GROUP BY "address"."city";

Object Expressions vs. String Interface

As you may have seen above, there are two insert statements: one is using the so-called 'object expression' whereas the other is using the 'string interface' for JSON documents.

Note that for one statement we simply pass a string for inserting a JSON document:

INSERT INTO mycollection VALUES ('<some json doc in form of a single-quoted string>')

For the other variant, we pass an object, that is properly embedded into HANA's SQL interface:

INSERT INTO mycollection VALUES ({"<some json-like object, that is understood by the SQL layer>"})

The latter is not a JSON document in the strict sense, because the JSON standard foresees using double quotes to identify strings. However, in HANA's SQL interface strings are identified by single quotes, which is why a JSON object expression uses slightly different quotes:

INSERT INTO mycollection VALUES ({"name": 'henry'})

If you are ever struggling with quotes, it's probably related to the fact that you are trying to use the wrong interface. When you have a proper JSON document, you should always use the string interface and pass the document as a single-quoted string.

Why do we need the object expression representation anyway, if it's not even proper JSON? Thanks for the question! This is actually a feature - and not meant to bully you with complex stuff. The object expression is required to seamlessly integrate JSON document handling into the SQL interface. You can do cool stuff like creating documents in projection without using cumbersome string concatenations:

SELECT {"surname": "name", "justthestreet": "address"."street"} FROM Customers

That's neat!

Joins between Tables and Collections

SAP HANA is a true multi-model database! It's combines the best of the relational and NoSQL world into one unified data management platform. Well, that means you should be able to join JSON collections with relational tables?

Yes, you can definitely do this! To do that, you first need to get to a common data model for joining your collection with your table. Technically, this means that you have to create an SQL view on top of your collection and transfer the required attributes into the relational model. On SAP HANA, you can define views implicitly at query time using the WITH clause.

Here is how a join would look like:

CREATE COLUMN TABLE SomeColumnStoreTable ("name" nvarchar(20), "age" int);



INSERT INTO SomeColumnStoreTable VALUES('Paul', 34);



WITH myView AS (SELECT "name", "address"."city" FROM Customers)  -- implicit view!

SELECT * FROM myView INNER JOIN SomeColumnStoreTable AS cst ON myView."name" = cst."name";

Please note that depending on your join criteria, you may have to apply datatype casting in the implicit view definition. Values returned by the JSON Document Store are strings per default!

Joins between Collections

A JSON collection can be joined with a JSON collection by using a similar construct as above:

create collection clubs;

create collection players;



insert into clubs values({"clubName": 'Heidelberg', "league": 'X'});

insert into players values({"playerName": 'Paul', "club": 'Heidelberg'});



with c as (select "clubName", "league" from clubs),

     p as (select "playerName" , "club" from players)

select * from c inner join p

     on c."clubName" = p."club";

Using Parameters in WHERE-Conditions

Yes, you can use parameters in WHERE conditions on the Document Store! There is just a tiny thing with regards to parameter typing, that you have to consider as opposed the similar query based on a table. Please find an example using function BIND_AS_VALUE here:

CREATE COLLECTION mycol;

INSERT INTO mycol VALUES('{"a":1, "b":1}');

INSERT INTO mycol VALUES('{"a":1, "b":2}');

INSERT INTO mycol VALUES('{"a":2, "b":1}');

INSERT INTO mycol VALUES('{"a":2, "b":2}');

INSERT INTO mycol VALUES('{"a":3, "b":1}');

INSERT INTO mycol VALUES('{"a":3, "b":2}');



DO BEGIN

	DECLARE v_int int;

	select max("a") into v_int from mycol;

	select * from mycol where "a" = bind_as_value(:v_int);

end;

Working with Arrays

Starting in 2021 the support of JSON arrays has been tremendously improved. Arrays are now a first-class citizen and can be properly queried and manipulated.

Access array elements by their index

-- direct index access

SELECT "array"[2] FROM MYCOLLECTION;



-- access via prepared statement

SELECT "array"[?] FROM MYCOLLECTION;



-- access via SQL variable

DO BEGIN    

    DECLARE n INT;    DECLARE i INT;

    SELECT CARDINALITY("array") INTO n FROM COLLECTION_WITH_ARRAY WHERE "id" = 0;

    FOR i IN 1 .. :n DO

        UPDATE COLLECTION_WITH_ARRAY SET "array"[:i] = 'visited' WHERE "id" = 0;

    END FOR;

    SELECT * FROM COLLECTION_WITH_ARRAY;

END;

Array Length

SELECT CARDINALITY("array") FROM MYCOLLECTION;

Filtering based on array elements

--Filter a document if any of the element of an array satisfies a predicate

SELECT "id", "array" 

FROM COLLECTION_WITH_ARRAY

WHERE 

FOR ANY "item" IN "array" SATISFIES

  "item"."id" < "id"

END;



--For any Filter can be nested:

SELECT "id", "array" 

FROM COLLECTION_WITH_ARRAY

WHERE 

FOR ANY "item" IN "array" SATISFIES

    FOR ANY "item2"  IN "item" SATISFIES

        "item2"."id" < "id"

    END

END;

Unnesting of arrays

Unnesting of an array can be described as the transpose operation: From a horizontal vector we obtain a column vector.

-- Independent UNNESTs

SELECT "m", "n" 

FROM COLLECTION_WITH_ARRAY

    UNNEST "array" as "m"

    UNNEST "array2" as "n"

WHERE

    "m" = 3 AND "n"[1] = 1;



-- Dependent UNNESTs

SELECT "m", "n", "n2" 

FROM COLLECTION_WITH_ARRAY

    UNNEST "array"  as "m"

    UNNEST "array2" as "n"

    UNNEST "n"      as "n2"

WHERE 

    "m" = 3 AND "n2" = 6;

Python Integration

The latest version of the hana-ml Python Client comes with enhanced support for multi-model data (i.e. JSON, graph & spatial). When working with SAP HANA Cloud in Python it is recommendable to make use of the built-in dataframe concept, which enables you to work with data residing in HANA without the need to transfer it to the (Python) client.

Specifically, for HANA's Document Store there is a package called hana_ml.docstore that enables you to easily upload datasets to collections. The underlying mechanism uses efficient bulk inserts to enable large data volumes.

from hana_ml.docstore import create_collection_from_elements

  

with open("./path/to/document.json") as json_file:

    data = json.load(json_file)

 

create_collection_from_elements(

    connection_context=connection_context,

    collection_name="test_collection",

    elements=data,

    drop_exist_coll=True,

)

Import/Export with Cloud Bucket

You can export JSON collection to the file system and import the same format again using the following statements:

CREATE COLLECTION mycol;



IMPORT FROM JSON FILE 's3-<AWS S3 region>://<AWS S3 Access Key ID>:<AWS S3 Secret Access Key>@<AWS S3 bucket>/<aws-object-id>' 

INTO mycol WITH BATCH 10000 THREADS 10 FAIL ON INVALID DATA;



EXPORT INTO JSON FILE 's3-<AWS S3 region>://<AWS S3 Access Key ID>:<AWS S3 Secret Access Key>@<AWS S3 bucket>/<aws-object-id>' 

FROM mycol WITH THREADS 10;

Bulk Inserts

HANA's JSON Document Store supports bulk inserts. Let me give you two examples for Java and Python.

Bulk Inserts with Java

Get the latest ngdbc driver (e.g. via maven) and connect to the database. Then create a collection and insert documents by using `autoclosable` connection and prepared statement.

try(final Connection connection = ds.getConnection(); final Statement pstmt = connection.prepareStatement("INSERT INTO collectionName VALUES(?)")) {

    connection.setAutoCommit(true);

 

    pstmt.setString(1, "{"productid": 6060, "price": 735.8228858436846}");

    pstmt.addBatch();

 

 

    pstmt.setString(1, "{"productid": 7090, "price": 735.8228858436846}");

    pstmt.addBatch();

 

 

    pstmt.setString(1, "{"productid": 9087, "price": 735.8228858436846}");

    pstmt.addBatch();

     

    pstmt.executeBatch();

}

Bulk Inserts with Python

For inserting JSON documents in Python it is recommended to use the hana-ml Python Client, which delivers this functionality out of the box. However, you can also take the native approach:

data = []

data.append(['{"key": 15, "key2": 100}'])

data.append(['{"key": 15, "key2": 101}'])

data.append(['{"key": 15, "key2": 102}'])

cursor.executemany("""insert into MyCollection values(?)""", data)

Document IDs

Unlike other document stores, HANA's Document Store does not automatically assign id fields to newly inserted documents. If you require such a field, you can generate it with a sequence in HANA.

CREATE COLLECTION c1;

CREATE SEQUENCE s1;

INSERT INTO c1 VALUES('{"KEY":' || s1.nextval || '}');

SELECT * FROM c1;

This blog is based on an SAP-internal write-up by Christian Bensberg, which many colleagues found helpful when interacting with JSON documents. Thanks, Christian!