Technical Articles
The Small JSON Document Store Cookbook
- Official JSON Document Store Guide (the single source of truth!)
- Getting Started Blog by Laura Nevin
However, if you have worked with JSON documents before, you may have stumbled upon a few questions, which I will try to cover in this blog entry. So, consider this an FAQ, cookbook, cheatsheet or simply a collection of best practices. In any case, I will not try to explain you the (JSON) world, but will try to focus on specific aspects.
Let me know if you found this helpful or have suggestions for enhancements!
Interacting with Collections
Instead of storing data in tables, HANA’s Document Store uses so-called collections. Collections support similar CRUD operations as tables.
-- creating a collection
CREATE COLLECTION Customers;
-- insert record using the Object Expression interface
INSERT INTO Customers VALUES(
{
"name": 'Paul',
"address": {
"street": 'Hauptstraße 10',
"city": 'Heidelberg'
}
});
-- insert record using the string interface
-- use this if you have a JSON string already in your application; notice the different quotes
INSERT INTO Customers VALUES('
{
"name": "Paul",
"address": {
"street": "Hauptstraße 10",
"city": "Heidelberg"
}
}');
-- Query with filter
SELECT * FROM Customers WHERE "address"."city" = 'Heidelberg';
-- Query with projection and aggregation
SELECT "address"."city", COUNT(*) FROM Customers GROUP BY "address"."city";
Object Expressions vs. String Interface
As you may have seen above, there are two insert statements: one is using the so-called ‘object expression’ whereas the other is using the ‘string interface’ for JSON documents.
Note that for one statement we simply pass a string for inserting a JSON document:
INSERT INTO mycollection VALUES ('<some json doc in form of a single-quoted string>')
For the other variant, we pass an object, that is properly embedded into HANA’s SQL interface:
INSERT INTO mycollection VALUES ({"<some json-like object, that is understood by the SQL layer>"})
The latter is not a JSON document in the strict sense, because the JSON standard foresees using double quotes to identify strings. However, in HANA’s SQL interface strings are identified by single quotes, which is why a JSON object expression uses slightly different quotes:
INSERT INTO mycollection VALUES ({"name": 'henry'})
If you are ever struggling with quotes, it’s probably related to the fact that you are trying to use the wrong interface. When you have a proper JSON document, you should always use the string interface and pass the document as a single-quoted string.
Why do we need the object expression representation anyway, if it’s not even proper JSON? Thanks for the question! This is actually a feature – and not meant to bully you with complex stuff. The object expression is required to seamlessly integrate JSON document handling into the SQL interface. You can do cool stuff like creating documents in projection without using cumbersome string concatenations:
SELECT {"surname": "name", "justthestreet": "address"."street"} FROM Customers
That’s neat!
Joins between Tables and Collections
SAP HANA is a true multi-model database! It’s combines the best of the relational and NoSQL world into one unified data management platform. Well, that means you should be able to join JSON collections with relational tables?
Yes, you can definitely do this! To do that, you first need to get to a common data model for joining your collection with your table. Technically, this means that you have to create an SQL view on top of your collection and transfer the required attributes into the relational model. On SAP HANA, you can define views implicitly at query time using the WITH clause.
Here is how a join would look like:
CREATE COLUMN TABLE SomeColumnStoreTable ("name" nvarchar(20), "age" int);
INSERT INTO SomeColumnStoreTable VALUES('Paul', 34);
WITH myView AS (SELECT "name", "address"."city" FROM Customers) -- implicit view!
SELECT * FROM myView INNER JOIN SomeColumnStoreTable AS cst ON myView."name" = cst."name";
Please note that depending on your join criteria, you may have to apply datatype casting in the implicit view definition. Values returned by the JSON Document Store are strings per default!
Joins between Collections
A JSON collection can be joined with a JSON collection by using a similar construct as above:
create collection clubs;
create collection players;
insert into clubs values({"clubName": 'Heidelberg', "league": 'X'});
insert into players values({"playerName": 'Paul', "club": 'Heidelberg'});
with c as (select "clubName", "league" from clubs),
p as (select "playerName" , "club" from players)
select * from c inner join p
on c."clubName" = p."club";
Using Parameters in WHERE-Conditions
Yes, you can use parameters in WHERE conditions on the Document Store! There is just a tiny thing with regards to parameter typing, that you have to consider as opposed the similar query based on a table. Please find an example using function BIND_AS_VALUE here:
CREATE COLLECTION mycol;
INSERT INTO mycol VALUES('{"a":1, "b":1}');
INSERT INTO mycol VALUES('{"a":1, "b":2}');
INSERT INTO mycol VALUES('{"a":2, "b":1}');
INSERT INTO mycol VALUES('{"a":2, "b":2}');
INSERT INTO mycol VALUES('{"a":3, "b":1}');
INSERT INTO mycol VALUES('{"a":3, "b":2}');
DO BEGIN
DECLARE v_int int;
select max("a") into v_int from mycol;
select * from mycol where "a" = bind_as_value(:v_int);
end;
Working with Arrays
Starting in 2021 the support of JSON arrays has been tremendously improved. Arrays are now a first-class citizen and can be properly queried and manipulated.
Access array elements by their index
-- direct index access
SELECT "array"[2] FROM MYCOLLECTION;
-- access via prepared statement
SELECT "array"[?] FROM MYCOLLECTION;
-- access via SQL variable
DO BEGIN
DECLARE n INT; DECLARE i INT;
SELECT CARDINALITY("array") INTO n FROM COLLECTION_WITH_ARRAY WHERE "id" = 0;
FOR i IN 1 .. :n DO
UPDATE COLLECTION_WITH_ARRAY SET "array"[:i] = 'visited' WHERE "id" = 0;
END FOR;
SELECT * FROM COLLECTION_WITH_ARRAY;
END;
Array Length
SELECT CARDINALITY("array") FROM MYCOLLECTION;
Filtering based on array elements
--Filter a document if any of the element of an array satisfies a predicate
SELECT "id", "array"
FROM COLLECTION_WITH_ARRAY
WHERE
FOR ANY "item" IN "array" SATISFIES
"item"."id" < "id"
END;
--For any Filter can be nested:
SELECT "id", "array"
FROM COLLECTION_WITH_ARRAY
WHERE
FOR ANY "item" IN "array" SATISFIES
FOR ANY "item2" IN "item" SATISFIES
"item2"."id" < "id"
END
END;
Unnesting of arrays
Unnesting of an array can be described as the transpose operation: From a horizontal vector we obtain a column vector.
-- Independent UNNESTs
SELECT "m", "n"
FROM COLLECTION_WITH_ARRAY
UNNEST "array" as "m"
UNNEST "array2" as "n"
WHERE
"m" = 3 AND "n"[1] = 1;
-- Dependent UNNESTs
SELECT "m", "n", "n2"
FROM COLLECTION_WITH_ARRAY
UNNEST "array" as "m"
UNNEST "array2" as "n"
UNNEST "n" as "n2"
WHERE
"m" = 3 AND "n2" = 6;
Python Integration
The latest version of the hana-ml Python Client comes with enhanced support for multi-model data (i.e. JSON, graph & spatial). When working with SAP HANA Cloud in Python it is recommendable to make use of the built-in dataframe concept, which enables you to work with data residing in HANA without the need to transfer it to the (Python) client.
Specifically, for HANA’s Document Store there is a package called hana_ml.docstore that enables you to easily upload datasets to collections. The underlying mechanism uses efficient bulk inserts to enable large data volumes.
from hana_ml.docstore import create_collection_from_elements
with open("./path/to/document.json") as json_file:
data = json.load(json_file)
create_collection_from_elements(
connection_context=connection_context,
collection_name="test_collection",
elements=data,
drop_exist_coll=True,
)
Import/Export with Cloud Bucket
You can export JSON collection to the file system and import the same format again using the following statements:
CREATE COLLECTION mycol;
IMPORT FROM JSON FILE 's3-<AWS S3 region>://<AWS S3 Access Key ID>:<AWS S3 Secret Access Key>@<AWS S3 bucket>/<aws-object-id>'
INTO mycol WITH BATCH 10000 THREADS 10 FAIL ON INVALID DATA;
EXPORT INTO JSON FILE 's3-<AWS S3 region>://<AWS S3 Access Key ID>:<AWS S3 Secret Access Key>@<AWS S3 bucket>/<aws-object-id>'
FROM mycol WITH THREADS 10;
Bulk Inserts
HANA’s JSON Document Store supports bulk inserts. Let me give you two examples for Java and Python.
Bulk Inserts with Java
Get the latest ngdbc driver (e.g. via maven) and connect to the database. Then create a collection and insert documents by using `autoclosable` connection and prepared statement.
try(final Connection connection = ds.getConnection(); final Statement pstmt = connection.prepareStatement("INSERT INTO collectionName VALUES(?)")) {
connection.setAutoCommit(true);
pstmt.setString(1, "{"productid": 6060, "price": 735.8228858436846}");
pstmt.addBatch();
pstmt.setString(1, "{"productid": 7090, "price": 735.8228858436846}");
pstmt.addBatch();
pstmt.setString(1, "{"productid": 9087, "price": 735.8228858436846}");
pstmt.addBatch();
pstmt.executeBatch();
}
Bulk Inserts with Python
For inserting JSON documents in Python it is recommended to use the hana-ml Python Client, which delivers this functionality out of the box. However, you can also take the native approach:
data = []
data.append(['{"key": 15, "key2": 100}'])
data.append(['{"key": 15, "key2": 101}'])
data.append(['{"key": 15, "key2": 102}'])
cursor.executemany("""insert into MyCollection values(?)""", data)
Document IDs
Unlike other document stores, HANA’s Document Store does not automatically assign id fields to newly inserted documents. If you require such a field, you can generate it with a sequence in HANA.
CREATE COLLECTION c1;
CREATE SEQUENCE s1;
INSERT INTO c1 VALUES('{"KEY":' || s1.nextval || '}');
SELECT * FROM c1;
This blog is based on an SAP-internal write-up by Christian Bensberg, which many colleagues found helpful when interacting with JSON documents. Thanks, Christian!


Thanks Mathias, very helpful (especially the joins with regular tables). Hope you will update this blog post whenever you find some new interesting functions.
Cheers, Uwe
Hi Mathias,
Do you have any best practices for working with the JSON Document Store in a CAP application?
Best,
Marco
Hi Marco,
as of now there is no native support of DocStore collections in CAP. However collections can be deployed as part of a CAP application’s HDI container with the corresponding hdbcollection file. You can now use a custom CAP handler to access the the collections at runtime.
Regards,
Mathias
Mathias Kemeter
what is the native approach for
Bulk Inserts with Python ?
I'm receiving bulk message from Kafka (topic) i.e. json string and it needs to be converted to hana db table? will pandas dataframe in python 3 operator and Hana write operater suffice ??
When working with Pandas in Python, I would recommend to take a look at the hana-ml Python Client. There is a specific function for creating or appending a table from a given Pandas Dataframe, which also handles bulk inserts. If you are looking at document store collections rather than table, there is also useful functionality.
Mathias Kemeter
Hi Rajesh, I recommend to check out a blog by my colleague Cameron Swift, which targets the question of purpose. Usually the reason to use a Document Store is not a technical one, but rather the modelling paradigm.
https://blogs.sap.com/2023/02/02/sap-hana-cloud-document-store-when-would-i-use-a-document-store/