Skip to Content
Technical Articles
Author's profile photo Wei Han

SAP BTP Data & Analytics Showcase – Load data into SAP HANA Cloud using SAP Data Intelligence

Introduction

This is the 1st of 5 blog posts which are part of the “SAP BTP Data & Analytics Showcase” series of blog posts. We recommend you to look into our overall blog post to gain a better understanding of the end-to-end story and context which involve multiple SAP HANA Database & Analytics Solutions in the cloud.

In this blog post, we will demonstrate how to create pipelines via the graphical modelling feature of SAP Data Intelligence and easily load data into SAP HANA Cloud. Two pipelines are designed and implemented in our case:

  • Pipeline 1: Load stations data in CSV files from DI internal data lake into SAP HANA Cloud

Pipeline to load station CSV files

  • Pipeline 2: Consume REST APIs provided by open website “Tankerkönig“, retrieve real-time prices for stations which have already been stored in the HANA table (done in pipeline 1), and land prices data in SAP HANA Cloud

Pipeline to consume REST APIs and load real-time prices

 

Setup connection to SAP HANA Cloud (HDI container)

Before you are going to create the pipelines, the connection to SAP HANA Cloud shall be configured firstly. In our case, we have created a HDI container in the SAP Business Technology Platform. You could follow the steps below to connect the HDI container and later load data into it:

You could create a new connection with the type “HANA_DB” and give related information about your HANA database (in our case HDI container), e.g., Host, Port, User, Password and so on.

Scenario 1: Load stations CSV files into SAP HANA Cloud (HDI container)

The first scenario what we are going to show is how to create a database table and load the stations data in CSV files into this newly created table which is located in the HDI container.

Step 1: Upload “stations.csv” into DI internal data lake

We consider the gasoline stations data in Germany as a kind of master data, which does not change frequently over time. So, we downloaded the “station.csv” file from the open Github repository and uploaded it into DI internal data lake via “Metadata Explorer”. You could also choose other connection types, e.g., Amazon S3 bucket, to upload your CSV files.

Within “Metadata Explorer”, let’s go to “Browse Connections”, select the connection type “DI_DATA_LAKE” and upload CSV files.

After uploading the data, you can explore and understand datasets better using the powerful features like data profiling, preparing and publishing in SAP Data Intelligence.

Step 2: Configure pipeline to create table and load stations data into HDI container

Now, we would like to show you how to create graphs containing a set of operators and facilitate us to create an initial table and load data into HANA Cloud. You could refer this document in SAP Help Portal for more details.

Firstly, we choose the “List Files” and “Read File” operators to connect and read the content of “station.csv”. Opening the configuration of “List Files”, you could configure the connection to DI internal data lake and select the file path pointing to the “station.csv”.

Secondly, we use the “SAP HANA Client” operator to create a new table called “STATIONS_MASTER” and meanwhile write the content of stations.csv into HANA Cloud. For this purpose, the input port “data” of SAP HANA Client operator is connected with the output port “file” of “Read File” operator. A operator to convert file to string is required when you are trying to connect the above described ports. Please make sure that you choose a proper type of the converter operator.

Finally, let’s have a look at the configuration of “SAP HANA Client” operator steps by steps.

  • Select the configured connection to the HDI container
  • Give the database schema and table name in the format of “YourSchemaName”.”STATIONS_MASTER”
  • Define table columns and data types for the station table
  • Choose “Create” for table initialization tab, as the station table doesn’t exist at this moment

Now we have completed the configuration for this simple pipeline. Let’s check the new-created table and preview stations data via the “Metadata Explorer“.

In summary, when you’re trying to load CSV files into SAP HANA Cloud, we’d like to draw your attention to the following aspect:

Defining appropriate data types during creating tables and loading data will help you simplify your modelling and visualisation parts. For instance, we chose the data type “Double” for columns “latitude” and “longitude”, when we loaded the “stations.csv” file into the HANA table. The reason is that the data type (“Double”) of “latitude” and “longitude” can be easily transformed into Geo Map format (ST_GEOMETRY), with the help of geo-enhanced modelling feature in SAP Data Warehouse Cloud, and visualised later in a geo-map in SAP Analytics Cloud.

Scenario 2: Consume REST APIs and load real-time prices data into SAP HANA Cloud (HDI container)

The second scenario we would like to demonstrate is how to retrieve real-time prices for stations stored in the “STATIONS_MASTER” table and create a table called “PRICES_RESTAPI” in HDI container to save prices data.

Step 1: Read stations data from HANA table

We recommend you to use the operator “HANA Table Consumer” for this step. The table “STATIONS_MASTER” contains a key field called “uuid” which is the parameter required by the endpoint of REST API.

Step 2: Consume REST APIs and pre-process json payloads using Python operator

In our case, we have decided to make use of the python library “Requests” directly to consume REST APIs instead of utilizing the operator “OpenAPI Client”. The reason is that we still need python script to pre-processing the json payloads which are returned by the “API-Method 2” listed in the Tankerkönig website before we save the data into HDI container.

First, we need to adapt the output from “HANA Table Consumer” into a CSV-based format which is accepted by “Python Operator”. To achieve this purpose, the “Flowagent CSV Producer” operator is put in use.

Second, let’s look into the configuration of “Python3 Operator” which is the most important part of this pipeline. Instead of creating our own custom operator, we have chosen the “Python3 Operator” as it reduces our efforts to create own docker file in SAP Data Intelligence and it is easy to use.

  • Add import port to your Python3 Operator named “station“, data type is “string
  • Add export port to your Python3 Operator named “price“, data type is “message
  • In the python script, the import and export ports need to be referenced

In addition, we have prepared the following python script (incomplete coding) which could help you understand the structure and the way to call REST APIs easily using Python. The import port “station” and export port “price” are reflected in the program.

import requests
import pandas as pd
import io
from datetime import datetime
from decimal import Decimal

'''
Retreive latest prices from public rest api of tankerkönig
'''
def on_input(data):
    
    dataframe = pd.read_csv(io.StringIO(data))
    station_uuids = dataframe[['uuid']]
    
    priceArray = []
    for index, row in station_uuids.iterrows():
        ....
        #About open REST API methods and API key to access the open Tankerkönig-Spritpreis-APIs
        #Please refer to the website https://creativecommons.tankerkoenig.de
        url = 'https://creativecommons.tankerkoenig.de/json/prices.php?ids=' + stationIds + '&apikey=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
        r = requests.get(url, timeout=50)
        ....  
        if r.status_code == 200:
          priceArray.append(val)

    df = pd.DataFrame(priceArray)
    df = df[["date", "station_uuid", "diesel", "e5", "e10", "dieselchange", "e5change", "e10change"]]
    api.send("price", api.Message(body=df.to_csv(index=False, header=True)))
    
api.set_port_callback('station', on_input)

Step 3: Create prices table and write data into HDI container

In this step, we would reuse the “SAP HANA Client” operator to create table structure and write real-time prices data into the newly-created table. We will not repeat the configuration steps of SAP HANA Client operator here. However, we would like to mention that using the python operator we have the opportunity to adapt the data types and store the proper data formats directly in HDI container, which is able to simplify the works of business users (No converters are required in the data modelling part).

Finally, we have finished all the steps for our second pipeline. Let’s have a look at the end result using the powerful Data Profiling feature in SAP Data Intelligence.

The following video will give you a more detailed impression of how these implementation steps look like in SAP Data Intelligence:

Summary

Congratulations! You have finished this session. We hope you are able to create your own pipelines in SAP Data Intelligence to connect non-SAP data sources and start your data story with the first step – Data integration.

We highly appreciate for your feedbacks and comments! Enjoy!

 

 

Assigned tags

      6 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Luiz Souza
      Luiz Souza

      Thank you, well explainded !
      Awesome!

      Author's profile photo Marcelo Berger
      Marcelo Berger

      Great article Wei. Can you please elaborate on how Data Intelligence compares to Integration Suite? when would I use one vs the other?

      Much appreciated!

      Marcelo

      Author's profile photo Wei Han
      Wei Han
      Blog Post Author

      Hi Marcelo,

      First of all, sorry for my delay. Thank you for your great questions.

      I contacted the product manager of SAP Data Intelligence regarding your questions and hope the below answer could help you.

      The SAP BTP Integration Suite has a focus on application integration, hence transactional (message oriented) and API integration, while SAP Data Intelligence Cloud rather concentrates on the data and technology integration plus the orchestration of data processes.

      SAP Data Intelligence provides streaming enabled ETL functionality via native interfaces, workflows for the design and control of end-to-end data processes across multiple environments, and the overarching monitoring to ensure execution safety.

      Best regards,

      Wei

      Author's profile photo Marcelo Berger
      Marcelo Berger

      Hi Wei,

       

      Thank you for your replies.

      I found a much simpler way to integrate Qualtrics into HANA Cloud:

      1. Setup a table in HANA Cloud
      2. Expose the table as oData
      3. Enable the survey flow in Qualtrics and within there, call the API that calls the oData URL and does a push command into the table the moment the survey is submitted

      Tested this out and it works like a charm.

      I appreciate your time in responding as well as the additional feedback.

      All the best,

      Marcelo

      Author's profile photo Wei Han
      Wei Han
      Blog Post Author

      Hi Marcelo,

      Thank you for reaching out. Unfortunately I am not the expert about this topic, so I cannot comment on your approach. What you can do maybe is to ask your question in the Q&A area of SAP community. Hope this will help you.

      Thank you and kind regards,

      Wei

      Author's profile photo Ateet Agrawal
      Ateet Agrawal

      flowagent producer is giving heap dump error.

      Group messages: Group: default; Messages: Graph failure: Flowagent encountered an OutOfMemoryError. To enable heap dump, please enable the flag via setting export ENABLE_FA_HEAP_DUMP=true in /vrep/flowagent/tp-clients-env.sh and re-run the graph. Flowagent heap will be stored on /vrep/flowagent/heapdump/ Process(es) terminated with error(s). restartOnFailure==false