Skip to Content
Technical Articles

Storing SAP Ariba data in the SAP HANA JSON Document Store

In previous blog posts, I’ve covered how to retrieve SAP Ariba data using the APIs available. Here, I will explain how this data can be stored in SAP HANA JSON Document Store in its raw format. For this, I will be using Python and the hdbcli package.

The other day, I heard of a customer who was unsure of which data to extract from the SAP Ariba Analytical Reporting APIs and end up deciding to extract everything. I thought, what an arduous task to generate all the tables to store all that data. Not just that but also converting all those JSON payloads/structures to rows and how will they deal with changes in the structure, new fields, relationships, etc. Then it clicked…. all those problems go away if you store it as a JSON.

Why would we want to store the data as a JSON document instead of storing it in a table?
In the data hungry world we live today, we generally want to keep as much data as possible. Why? storage is cheap and we don’t know when we might need to query a specific field. Also, it can be expensive and time consuming to reprocess data just to add a couple of fields in our reporting.

Generally we have little control on the data returned by the APIs and additional data (fields) might be included in an API response in a new release. It can be incredibly time consuming to fix our data pipelines to just include a new field or two that are now included in the response. To avoid this, we can store the data as it is sent by SAP Ariba. If we do so, there will be no changes required in our pipelines, no need to alter tables to add new fields and the data will be available for querying as soon as the fields are in the response. Also, the documents stored will be a copy of what the API returns.

If you need to setup SAP HANA Express edition locally, I recommend checking out the amazing tutorials available at https://developers.sap.com on how to get started with SAP HANA Express edition. Personally, I set it up as a virtual machine and followed this tutorial group -> https://developers.sap.com/group.hxe-install-vm.html

Once we have SAP HANA Express edition running, we will need to enable the document store. For that, we will need to connect to SYSTEMDB and alter the database to add the document store capability.

To learn how to connect to SAP HANA Express edition, check out the Use Clients to Query an SAP HANA Database tutorial mission -> https://developers.sap.com/mission.hana-cloud-clients.html.

# Connect to SYSTEMDB, this is exposed in port 39013
$ hdbsql -n 192.168.1.10:39013 -u SYSTEM -p MyStrongPassw0rd1

# Check the database available to connect
hdbsql SYSTEMDB=> SELECT DATABASE_NAME, HOST, SQL_PORT FROM SYS_DATABASES.M_SERVICES where SQL_PORT != 0;

# We will get a response similar to the one below
# DATABASE_NAME,HOST,SQL_PORT
# "SYSTEMDB","hxehost",39013
# "HXE","hxehost",39015

# Enable docstore in HXE
hdbsql SYSTEMDB=> ALTER DATABASE HXE add 'docstore';

# Exit SYSTEMDB
hdbsql SYSTEMDB=> \q

 

Now we will create a separate user, which will be used to create the collection where we will store the data and to connect from our Python script. For more information on user creation -> https://developers.sap.com/tutorials/hana-clients-hdbsql.html.

# Login to HXE to create separate user. This user will be used in our script
$ hdbsql -n 192.168.1.10:39015 -u SYSTEM -p MyStrongPassw0rd1

# Create the user
hdbsql HXE=> CREATE USER DEMO PASSWORD MySecondStrongPassw0rd1 no force_first_password_change;

# Connect as our demo user and create a collection
hdbsql HXE=> CONNECT USER1 PASSWORD Password1;
hdbsql HXE=> CREATE COLLECTION SourcingProjectSourcingSystemView;

# Exit HXE
hdbsql HXE=> \q

 

Once we have our user created, we can store the connection details in secure user store (hdbuserstore). The user key created will then be used by the Python script to connect to SAP HANA Express edition.

# We are setting the port to 39015 to connect to HXE
$ hdbuserstore Set DEMOUserKey 192.168.1.10:39015 DEMO MySecondStrongPassw0rd1

# Check that the key has been created
$ hdbuserstore List

 

We now have connection details in our secure store. Let’s test our connection.

# Connect to HANA
hdbsql -attemptencrypt -U DEMOUserKey

 

Now that we are able to connect to the database, let’s write some code to extract the data from SAP Ariba, do some minor “tweaks” to it and store it in SAP HANA JSON Document Store.

import json

from hdbcli import dbapi
import requests

# SAP Ariba Analytical Reporting API details
OAUTH_URL = 'https://api.ariba.com/v2/oauth/token'
AR_URL = 'https://openapi.ariba.com/api/analytics-reporting-view/v1/prod/views/SourcingProjectSourcingSystemView?realm=[YOUR_REALM]&filters=%7B"createdDateFrom":"2020-06-01T00:00:00Z","createdDateTo":"2020-11-22T00:00:00Z"%7D'
BASE64_AUTHSTRING = '[YOUR_BASE64_AUTHSTRING]'
API_KEY = '[YOUR_API_KEY]'

# HANA Express details
HDB_USER_STORE_KEY = 'DemoUserKey'
COLLECTION_NAME = 'SourcingProjectSourcingSystemView'

def get_access_token():
    """Get an acces_token from the SAP Ariba OAuth Server"""

    payload='grant_type=client_credentials'
    headers = {
        'Authorization': f'Basic {BASE64_AUTHSTRING}',
        'Content-Type': 'application/x-www-form-urlencoded'
    }

    response = requests.request('POST', OAUTH_URL, headers=headers, data=payload)

    return response.json()['access_token']


def documents_in_collection(cursor, coll):
    """Finds out the number of documents in the collection.
    
    Parameters
    ----------
    cursor : Cursor
        SAP HANA database cursor
    coll : str
        The name of the collection to query

    Returns
    -------
    int
        The count of documents in the collection
    """

    cursor.execute(f'SELECT COUNT(1) AS COUNT FROM {coll}')
    # print(rows)
    for row in cursor:
        for col in row:
            return col


#Initialize your connection
conn = dbapi.connect(
    key=HDB_USER_STORE_KEY, # address, port, user and password are retreived from the hdbuserstore
    encrypt=True, # must be set to True when connecting to HANA Cloud
    sslValidateCertificate=False # True for HANA Cloud, False for HANA Express.
)

cursor = conn.cursor()

#If no errors, print connected
print('Connected to HANA')

payload={}
headers = {
  'apiKey': API_KEY,
  'Authorization': f'Bearer {get_access_token()}',
}

response = requests.request("GET", AR_URL, headers=headers, data=payload)
records = response.json()['Records']

print(f'Documents in collection before inserting data: {documents_in_collection(cursor, COLLECTION_NAME)}')

for record in records:

    # Converting dict to str before replacing the key single quote with double quotes
    # The conversion needs to take place because of the format that HANA expects 
    # JSON documents to be inserted -> {"key": 'value', "key2": 'value2'}
    record_str = str(record).replace("{'", "{\"") \
        .replace("':", "\":") \
        .replace(", '", ", \"") \
        .replace('None', 'null')

    sql = f'INSERT INTO {COLLECTION_NAME} values({record_str});'
    cursor.execute(sql)

print(f'Documents in collection after insert: {documents_in_collection(cursor, COLLECTION_NAME)}')

cursor.close()
conn.close()

 

The data will now be in the SAP HANA JSON Document Store. One of the cool aspects of having the data in the Document Store, is that we can use SQL to insert/query the collection(s). As our JSON data will now be ready to query, we can extract only the fields that we require from it. To find out more on how to insert / query data from collections -> https://help.sap.com/viewer/4505d0bdaf4948449b7f7379d24d0f0d/2.0.04/en-US/66a8d33fe06d47acbfafc0782c513d7b.html.

Be the first to leave a comment
You must be Logged on to comment or reply to a post.