Skip to Content
Technical Articles
Author's profile photo Sangeetha Krishnamoorthy

Federating Queries in SAP Data Warehouse Cloud from Amazon Athena to Derive Insights

This blog is Part-I of a  2-part series written to demonstrate the various ways data stored in AWS can be made available inside SAP solutions for analysis and deriving useful insights.

Part 2: Federating Queries in  SAP HANA Cloud from Amazon Athena using Athena API Adapter to Derive Insights

In this blog, we will focus on federating queries across SAP Data Warehouse Cloud and Amazon Athena without requiring any data movement out of SAP business systems.

Amazon Athena is an interactive query service that helps analyze data in Amazon S3 using standard SQL.  By integrating Amazon Athena with SAP Data Warehouse Cloud, we can now combine data from SAP Data Warehouse Cloud with data in Amazon S3 through federated Amazon Athena queries thereby leveraging SAP Data Warehouse Cloud’s  rich analytical capabilities to derive useful insights .

Goal:  External data referred in this blog is stored in a S3 bucket as CSV file(s). We will join the data from this S3 source with data from SAP Data Warehouse Cloud’s local table to create a story in SAP  Data Warehouse Cloud.

Here are the overview of sections involved in this integration:

1. S3 : Prepare input data set

2. Athena : Create and  Configure Amazon Athena service

3. SAP Data Warehouse Cloud  : Create the connection to Athena service

4. Prepare  data models in SAP Data Warehouse Cloud Data Builder (virtual tables for Athena data and local table for SAP HANA Data)  and create story

1. S3 : Prepare input data set 

1.1  Log into the AWS Management Console with an AWS IAM user account , Go to the S3 service, add a bucket , create a folder and upload the source CSV file into it.

1.2 Once the CSV is uploaded  go to the properties tab of the object and copy the S3 folder path

 

2. Athena : Create and  configure Amazon Athena service

2.1 Go to Services –> Athena and click  on the Get Started button

 

2.2 Now let’s start by creating a new “Workgroup” (By default it is set to “primary”.) A workgroup in Athena is used to isolate query list  and query history and groups queries for easy cost constraint enforcements.

 

 

Now switch to this newly created workgroup.

 

2.3  Lets create a table definition in Athena, pointing it to the S3 data source.

If you are comfortable using SQL DDL statements for creating tables  , feel free to use Athena Query editor to do the same. For eg., :

CREATE OR REPLACE EXTERNAL TABLE `salesdata`(
  `region` string, 
  `country` string, 
  `iitemtype` string, 
  `saleschannel` string, 
  `orderpriority` string, 
  `orderdata` string, 
  `orderid` string, 
  `shipdate` string, 
  `unitssold` int, 
  `unitprice` float, 
  `unitcost` float, 
  `totalrevenue` double, 
  `totalcost` double, 
  `totalprofit` double)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://<>/dataset/sales_data'
TBLPROPERTIES (
  'has_encrypted_data'='false')

If you want to proceed using  the creation template workflow, here are the steps:

2.3.1 Click on Connect data source.

 

 

2.3.2  In the next screens, continue with the default selections. In the Glue Catalog screen, you can choose to let the Glue crawler automatically scan and create table definitions for S3 data in Athena or you can choose to define the tables manually.

 

2.3.3  Point the input data set to the S3 bucket folder . Choose input data format (csv) in Step 2 .

2.3.4  In Step3, you can choose to bulk add the columns by defining the columns in comma separated <columnname>  <datatype>, format.

Clicking Add should add finish adding the table to the schema.

2.4 Now that the table is created, Schema and table definitions are reflected in Athena and a query editor is made available to query on the source data from S3 using SQL.

 

2.5 Now that the Athena tables are defined, let’s prepare for the SAP Data Warehouse Cloud integration .

       2.5.1  Since the corporate user for our AWS account has multi-factor authentication enabled,  let’s create a new user(as mentioned in below steps on your AWS Account Security Management screen) and give it all permissions needed to access Athena and S3 .

  1. Go to AWS Access Management console and click on Add users.
  2. For Credential Type select Access key – Programmatic access
  3. For permission select all permissions mentioned below. Save the Access and Secret Key generated for this user.

 

      2.5.2   Download AWS root CA as .pem file from here .

 

      2.5.3.  Download S3 bucket’s root CA (Baltimore CyberTrust Root) by bringing up Object URL of the S3 data store in browser and downloading the certificate.

3. SAP Data Warehouse Cloud : Create the connection to the Athena service

 

3.1  Log in to the SAP Data Warehouse Cloud tenant instance and click on Configuration –> Security. Then click ‘+’ to add the two AWS certificates we downloaded above. This helps AWS trust the integration source system that calls the Athena APIs.

Configuration_Security

Configuration_Security

3.2 . In the SAP Data Warehouse Cloud space management screen, go to the Connections tab and add a connection.  Click on Amazon Athena

 

3.3 Configure the connection by entering the required details:

For Region enter the region of Athena service. For workgroup enter the name of the Workgroup we created in step 1.7 above . For Access and Secret Key , enter the Access and Secret Key that we created for the IAM user in step 1.9.1 above.

Save and ‘validate’ the connection by clicking the validate button in the connections list

4. Prepare  data models in SAP Data Warehouse Cloud Data Builder and Create Story

4 1 In Data Builder, create a new SQL view by selecting  Athena table as the remote source. Drag and drop the table to create a virtual table in SAP Data Warehouse Cloud. Save and deploy the view.

4.2  Create a local/SAP HANA table that can be joined with the virtual table in Athena.

4.3  Create an analytical view by doing a JOIN between tables with ETL . Deploy this model.

 

 

Every time the federated query is run (i.e., the virtual table is accessed) in SAP Data Warehouse Cloud,  it creates an entry in the Query History in Athena and is trackable.

 

 

In summary, by federating data from external sources like Amazon Athena , it is possible to derive more valuable insights by combining it with data inside SAP without the need to replicate data. This helps eliminate any ETL, additional storage costs to house any duplicated data and lowers total cost of doing Analytics combining SAP and non-SAP data.

Thanks for reading.

Update: You can now try this integration step by step using the Discovery center mission for free at : https://discovery-center.cloud.sap/missiondetail/3401/3441 

If you have any questions, please reach out to ci_sce@sap.com.

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Jie Xiong
      Jie Xiong

      awesome!