Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
robertosinohara
Employee
Employee

1.    Introduction


In this blog post I will share the experience and how to extract data from a third part API into SAP Data Warehouse Cloud. The solution is based on a micro-service concept that cctop88 explained on the blog post “Enrich Data in SAP Data Warehouse Cloud with Data from RESTful APIs | SAP Blogs “.

Surely, some adjustments might be need as different APIs or APIs providers may have different authentication requirements and/or procedure.

As a reference, REST API as used in this example is further explained on the link below:

What Is a REST API? Examples, Uses, and Challenges | Postman Blog

Further information is vastly available on the internet.

2.    Understanding the supplier basic API specifications


The API supplier in this case uses a REST API based on OAUTH2 protocol for authorization. This authorization is done using two html POST requests to the API.

The first post should send the Client_ID and client_secret (Provided to the client user by the supplier) for authentication on the API and a reply should be received with a Bearer token that can be used for requesting data from the API.

The second post is for requesting the data from the API and to do so, the token replied to the first post must be inserted in this second request to authenticate. The API should reply to this second post with a JSON with the required data that will further be placed on SAP Data Warehouse Cloud.

3.    Configuring SAP Data Warehouse Cloud


The first step is to create a Space to work on the SAP Data Warehouse Cloud tenant that will be used. To do so, navigate to the Space Management in SAP Data Warehouse Cloud and create a new Space. The space can be named accordingly to its purpose, but make sure to take notes and adjust configurations wherever needed.

In order to achieve the tasks to communicate with the API, an user should be added as member (or even as administrator) of this new space. That can be done in the members / Members assignment section by clicking on “Add” and setting the needed users.


Fig. 3.1 Add Member


The next step is to setup the Database Access. In Database Access / Database Users section, you should create a new database user.


Fig. 3.2 Create Database User


This user can be name accordingly and must be configured as Enabled to Read Access (SQL) and as Enabled Write Access (SQL, DDL, &DML). Reading and write access are needed as this is the user that SAP BTP will use to access data on SAP Data Warehouse Cloud . The picture below shows how it should look with the name USERNAME on a COLOMBOTEST Space.


Fig. 3.3 Create Database User form


After that, the space should be deployed, and the database user details ("HOSTNAME", "DATABASE_USERNAME", "PASSWORD" and “OPEN SQL SCHEMA” latter called as "INTERNAL SCHEMA NAME") copied as it will be needed in the SAP BTP Service configuration. This can be done by clicking on the “i” icon shown below:


Fig. 3.4 Finding Database User Details


You should “Request a new Password” in order to copy the password value.

After that, the tables can be created by accessing the Database explorer:


Fig. 3.5 Opening Database explorer


Hana Database Explorer should open in a new tab and the created database can be added by clicking on + symbol on left upper corner. The database type should be kept as SAP Hana Database and the Host, Port, User and Password filled as it was on the SAP Data Warehouse Cloud. The Display Name can be shortened for convenience.

The SQL console (icon on the top left corner) is used to manage the tables. We are using tables for two different purposes:

  1. Configuration tables to store configuration or setup information that will adjust connexion behaviour – SAP BTP will read on those tables.

  2. Data tables that will store data – SAP BTP will write on those tables.


In both cases, the tables were designed according to the purpose and needs for each specific solution.

For this purpose and API supplier, it was convenient to create a single configuration table to manage all the reports the API could provide. In this table were stored report numbers, column names in SAP Data Warehouse Cloud , column names in the API file and the data type. For each report a data table was created to store the data. The column names and data types must match the information brought by the API, otherwise the data could be misplaced, or errors could occur in the SQL sentences.

Be aware that:

  • Misspelling or changes on column names could lead to data misplacement on data Tables.

  • Tables that will be read from the SAP BTP side (Configuration tables and that tables that feed the API) need to be deployed and exposed for consumption


4.    Configuring SAP BTP Services


For security purpose, it is not recommended to expose credentials on the code. In order to enable SAP BTP to access both the SAP Data Warehouse Cloud and the API two services were configured to store those credentials on Cloud Foundry environment variables (i.e., VCAP Services).

  1. In order to create services, you should enter the SAP BTP Account, enter the subaccount that will be used:



Fig. 4.1 Creating Services step 1


And on Cloud Foundry Environment click on 0 that appears under Service Instances:


Fig. 4.2 Creating Services step 2


Next, on the top right corner we click on the arrow next to "Create Instance" and on "User-Provided Service instance


Fig. 4.3 Creating Services step 3




  1. The first service, used to provide access to the SAP Data Warehouse Cloud Space, has been named as "SAP Data Warehouse Cloud-Colombo Test-data-Space" and need to be configured with the above JSON to perform its task:


{
"host_and_port": "HOSTNAME:443",
"user": "DATABASE USERNAME",
"password": "PASSWORD",
"schema": "INTERNAL SCHEMA NAME"
}

Note that "HOSTNAME", "DATABASE_USERNAME", "PASSWORD" and "INTERNAL SCHEMA NAME" should be replaced with the correct values that were configured on the SAP Data Warehouse Cloud.

  1. The second service, used to provide access to the API connection, has been named as "WS-API" and it refers to the supplier API. More or less configuration should be passed through depending on the API requirements. The original Christoph Christophersen case was created to connect to open Weather API and as so, it was using the above JSON configuration:


{
"url": "http://api.openweathermap.org/data/2.5/weather?q=%s,%c&appid=%k&units=metric",
"keys": {
"crawler": [
"PASTE API KEY HERE"
]
},
"limit": false,
"timeout": 1000,
"secret": "",
"cleanOutdated": true,
"keepLatest": 7
}

In our case, some adjustments were needed. First, the url was changed to the right API endpoint. Then, instead of using an API Key, the API supplier used a user_id and a user_secret to authenticate. This could be done in several ways, but we chose to create a second item inside "keys" and rename both. As a result, we came to the JSON above:
{
"url": "https://api-sandbox.supplier.com/oauth/token",
"cleanOutdated": true,
"keepLatest": 7,
"keys": {
"client_id": [
"CLIENT_ID_CREDENTIAL_PROVIDED_BY_SUPPLIER"
],
"client_secret": [
"CLIENT_SECRET_CREDENTIAL_PROVIDED_BY_SUPPLIER"
]
},
"limit": false,
"secret": "",
"timeout": 1000
}

Note that both CLIENT_ID_CREDENTIAL_PROVIDED_BY_SUPPLIER and CLIENT_SECRET_CREDENTIAL_PROVIDED_BY_SUPPLIER must be replaced for the correct API connection.

For basics functionalities, those two services are enough and will enable SAP BTP to access the Data Warehouse Cloud and provide to the micro-service the needed information to connect to the API.

5.    Open Weather Mini service explanation


The mini service provided by Christoph blog through GitHub can be divide in 5 main tasks:

  1. Get API credentials from VCAP Services

  2. Send a get request to the API

  3. Get the data from the API

  4. Access the Data Warehouse Cloud using the provided credentials

  5. Load the data to the SAP Data Warehouse Cloud using SQL composed statements on the code.


As mentioned, in order to access our supplier API, a different approach is needed:

  1. Get API Credentials from VCAP Services

  2. Send a Post request to the API in order to get a token access

  3. Extract the token access from the body of reply from the API

  4. Send a Post request to the API with the token access to request the report data

  5. Access the Data Warehouse Cloud using the provided credentials

  6. Get the column headers from the existing tables of the report

  7. Find out the data related to each column header to compose the dataset in the correct order

  8. Compose the SQL statements to write the data into the tables

  9. Send the SQLs created to the SAP Data Warehouse Cloud


Comments:

  • The mini service will load several lines at the same time in SAP Data Warehouse Cloud

  • Each line is composed of a hole dataset entry with a large number of columns


6.    Adjustments in Mini Service


As last seen, some adjustments were needed in order to perform the connection between our supplier API and the SAP Data Warehouse Cloud.

  • The first step for our supplier API is mostly the same as the first one provided by Christoph.

  • The second one just need some adjustments in the request to be sent to a POST request with the correct headers and body. The whole function is below:

  • The third step is already built in the code above as the function returns the data.access_token string. This string is extracted from the body of API reply after it was converted to a JSON format.

  • The fourth step sends the POST request for the report and then converts the reply from the API to a JSON:


Please note that the first step of this code is calling the get token function we reviewed in the previous topic.

  • The fifth step is exactly the same as it was on the original code. So no changes were needed here.

  • For the sixth step, it was used a similar method of the one in the original code to read the cities in a table from SAP Data Warehouse Cloud and again, minimal changes were needed here so the SQL return is the column headers. The final code is described below:


This code will return an array of objects with a single key named Column_Name. This format is not very friendly, so it was worked out to be a simpler array of strings of the column names. This was done by the code below:



  • The Seventh step consists of identifying whether the data for each column header is present in the JSON with the data from the API. As simple as it seems, there are some tricky parts on it such as what to do when there is no data in the API for a specific column header or how to deal with the quotes inside a string data (API brings strings with quotes inside that would be understand by the SQL as a closing quote instead of a inside quote). The final code is shown below:


To solve the inside quotes problem, it is been replaced by two single quotes as an escape character.

To solve the undefined data in the API, it is replaced by a NULL value and for normal data it is just inserting quotes before and after the value to compose the SQL statement.

  • The eighth step’s is just to assemble the result from the seventh step in a valid SQL statement that would write the data on the SAP Data Warehouse Cloud. For this purpose, a UPSERT statement is being compose as the data used could be new or just an update of data already in the dataset. This is done by the code below:


Remark that for this to work the SAP Data Warehouse Cloud tables should have a standardized name, in this case “WS_REPORT_” followed by the report number. This is also valid for the column headers reading and is key to have the connection working for more than one report.

  • Finally, the ninth and last step is to run this SQL code on the SAP Data Warehouse Cloud. This is similar of the original code, but it is done for several lines of data, thus it has been included inside a for loop as it follows:


7.    Other features


The main functionality is described above, and it enables basic connection. Yet, some more specific treatment was needed for the project with some more little sets of codes. Below follow a list of the most significant ones:

  • The connection was running for more than one report at time, so a for loop was created to run the task for each report.

  • For user friendly use, the list of reports was placed on a SAP Data Warehouse Cloud table and so it is read from there to the code similarly to the column header reading. This way, if reports numbers change or some are added it could be adjusted simply by adding it to the report list and creating the table with the right column headers.

  • The structure of data we got from the API was more complex than expected with arrays and objects inside each other. Ensure the functionality, the JSON replied from the API was flattened and the column names adjusted to match these conditions.


8.    Limitations


As described, this mini service solution has some versatility but also have its limitations. Some of them are noted below:

  • As it is running on the SAP BTP, some errors on the connection gives no feedback to the user. As so, it could be running as scheduled but failing to update the data without any alert to the final user or Application

  • There are some names and parameters that are hard coded and could be beneficial to have them on a SAP Data Warehouse Cloud table to let the connection more flexible such as:

    • Report name

    • Headers parameters for the token request

    • Token request address

    • Headers parameters for the report request

    • Report request address

    • The table names are fixed to WS_REPORT_(Flexible Report number), it could be a variable



  • When an error occurs on a report, no other report is run as the whole connection halts

  • The data formats eventually can cause problems as the SAP Data Warehouse Cloud database has previously defined datatypes for each column on the table and the API may not have the same limitations. An example of that is when NVARCHAR fields planned to be no longer then a specific size and are replied from the API larger than this size. This causes an error on the connection and as mentioned halts the whole process even other reports or data entries.

  • The column Headers should match the data names coming from the API, otherwise a Null value will be placed on the data. This turns the solution versatile and delegates the remaining ETL process to the SAP Data Warehouse Cloud dataflow but if there are errors in this step, it would be noticed only in data analysis.


9.    Improvements to be done


This entire procedure can be improved by reducing or eliminating its limitations mentioned above or adding new features to the solution as required for each case.

Some of these limitations are related to the environment it runs, but some “workaround” might be useful.

10.   Outlook


In this tutorial we have implemented an example of extracting data from a third-party API into Data Warehouse Cloud using a micro-service on SAP Business Technology Platform. This enables Analytics to be extended to external data in synergy with other internal data. Unfortunately, the data flow is not the fastest one but if this functionality is integrated to SAP Data Warehouse Cloud, it would probably be enhanced in that matter too.

As mentioned, this blog post was based on cctop88’s one and surely could not be done without this input. Yet, besides Christoph, I would like to thanks:

  • Moisés Pires for the idea to make this solution and guidance to write a blog post about it.

  • Renato Antunes for the JavaScript coding foundation that enabled me to work over the original ones from Christoph.

  • Miguel Sequeira, Kevin Gomes and David Henriques for the help with the debugging matters.

2 Comments