Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Sangeetha_K
Product and Topic Expert
Product and Topic Expert
This blog is Part-I of a  2-part series written to demonstrate the various ways data stored in AWS can be made available inside SAP solutions for analysis and deriving useful insights.

Part 2: Federating Queries in  SAP HANA Cloud from Amazon Athena using Athena API Adapter to Derive ...

In this blog, we will focus on federating queries from SAP Datasphere to Amazon Athena without requiring any data movement out of SAP business systems.

Amazon Athena is an interactive query service that helps analyze data in Amazon S3 using standard SQL.  By integrating Amazon Athena with SAP Datasphere, we can now combine data from SAP Datasphere with data in Amazon S3 through federated Amazon Athena queries thereby leveraging SAP Datasphere's  rich analytical capabilities to derive useful insights .

Goal:  External data referred in this blog is stored in a S3 bucket as CSV file(s). We will join the data from this S3 source with data from SAP Datasphere's local table to create a story in SAP  Datasphere.


Here are the overview of sections involved in this integration:


1. S3 : Prepare input data set

2. Athena : Create and  Configure Amazon Athena service

3. SAP Datasphere  : Create the connection to Athena service

4. Prepare  data models in SAP Datasphere Data Builder (virtual tables for Athena data and local table for SAP HANA Data)  and create story

1. S3 : Prepare input data set 

1.1  Log into the AWS Management Console with an AWS IAM user account , Go to the S3 service, add a bucket , create a folder and upload the source CSV file into it.


1.2 Once the CSV is uploaded  go to the properties tab of the object and copy the S3 folder path


 

2. Athena : Create and  configure Amazon Athena service

2.1 Go to Services --> Athena and click  on the Get Started button


 

2.2 Now let's start by creating a new "Workgroup" (By default it is set to "primary".) A workgroup in Athena is used to isolate query list  and query history and groups queries for easy cost constraint enforcements.

 


 

Now switch to this newly created workgroup.


 

2.3  Lets create a table definition in Athena, pointing it to the S3 data source.

If you are comfortable using SQL DDL statements for creating tables  , feel free to use Athena Query editor to do the same. For eg., :
CREATE OR REPLACE EXTERNAL TABLE `salesdata`(
`region` string,
`country` string,
`iitemtype` string,
`saleschannel` string,
`orderpriority` string,
`orderdata` string,
`orderid` string,
`shipdate` string,
`unitssold` int,
`unitprice` float,
`unitcost` float,
`totalrevenue` double,
`totalcost` double,
`totalprofit` double)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://<>/dataset/sales_data'
TBLPROPERTIES (
'has_encrypted_data'='false')

If you want to proceed using  the creation template workflow, here are the steps:

2.3.1 Click on Connect data source.


 

 

2.3.2  In the next screens, continue with the default selections. In the Glue Catalog screen, you can choose to let the Glue crawler automatically scan and create table definitions for S3 data in Athena or you can choose to define the tables manually.


 

2.3.3  Point the input data set to the S3 bucket folder . Choose input data format (csv) in Step 2 .


2.3.4  In Step3, you can choose to bulk add the columns by defining the columns in comma separated <columnname>  <datatype>, format.


Clicking Add should add finish adding the table to the schema.

2.4 Now that the table is created, Schema and table definitions are reflected in Athena and a query editor is made available to query on the source data from S3 using SQL.



 

2.5 Now that the Athena tables are defined, let's prepare for the SAP Datasphere integration .


       2.5.1  Since the corporate user for our AWS account has multi-factor authentication enabled,  let's create a new user(as mentioned in below steps on your AWS Account Security Management screen) and give it all permissions needed to access Athena and S3 .




  1. Go to AWS Access Management console and click on Add users.

  2. For Credential Type select Access key – Programmatic access

  3. For permission select all permissions mentioned below. Save the Access and Secret Key generated for this user.



 

      2.5.2   Download AWS root CA as .pem file from here .


 

      2.5.3.  Download S3 bucket's root CA (Baltimore CyberTrust Root) by bringing up Object URL of the S3 data store in browser and downloading the certificate.




3. SAP Datasphere : Create the connection to the Athena service


 

3.1  Log in to the SAP Datasphere tenant instance and click on Configuration --> Security. Then click '+' to add the two AWS certificates we downloaded above. This helps AWS trust the integration source system that calls the Athena APIs.



Configuration_Security


3.2 . In the SAP Datasphere space management screen, go to the Connections tab and add a connection.  Click on Amazon Athena



 

3.3 Configure the connection by entering the required details:



For Region enter the region of Athena service. For workgroup enter the name of the Workgroup we created in step 1.7 above . For Access and Secret Key , enter the Access and Secret Key that we created for the IAM user in step 1.9.1 above.


Save and 'validate' the connection by clicking the validate button in the connections list


4. Prepare  data models in SAP Datasphere Data Builder and Create Story

4 1 In Data Builder, create a new SQL view by selecting  Athena table as the remote source. Drag and drop the table to create a virtual table in SAP Datasphere. Save and deploy the view.



4.2  Create a local/SAP HANA table that can be joined with the virtual table in Athena.


4.3  Create an analytical view by doing a JOIN between tables with ETL . Deploy this model.



 


 

Every time the federated query is run (i.e., the virtual table is accessed) in SAP Datasphere,  it creates an entry in the Query History in Athena and is trackable.



 

 

In summary, by federating data from external sources like Amazon Athena , it is possible to derive more valuable insights by combining it with data inside SAP without the need to replicate data. This helps eliminate any ETL, additional storage costs to house any duplicated data and lowers total cost of doing Analytics combining SAP and non-SAP data.

Thanks for reading.

Update: You can now try this integration step by step using the Discovery center mission for free at : https://discovery-center.cloud.sap/missiondetail/3401/3441 

If you have any questions, please reach out to ci_sce@sap.com.
1 Comment