Federating Queries in HANA Cloud from Amazon Athena using Athena API Adapter to Derive Insights
This blog is Part-2 of a 2-Part series written to demonstrate the various ways data stored in AWS can be made available inside SAP Solutions for analysis and deriving meaningful insights.
The other three parts can be found :
In this blog, we will focus on federating queries across SAP HANA Cloud and Amazon Athena and see how how this architecture does not require data copy from SAP business systems thereby preserving the host system semantics at the same time enabling powerful analysis using data from the systems.
Amazon Athena is an interactive query service that helps analyze data in Amazon S3 using standard SQL. So, by integrating Amazon Athena with SAP HANA Cloud using the SAP HANA SDA feature and ODBC drivers from Amazon Athena, we can now federate queries from SAP HANA Cloud to Athena, combining data from SAP HANA Cloud with data in Amazon S3 and leveraging SAP HANA Cloud’s powerful analytical capabilities to derive useful insights .
Goal: External data referred in this blog is stored in a S3 bucket as CSV file(s). We will join the data from this S3 source with data from SAP HANA Cloud local tables to create a analytical SQL View.
Here are the overview of sections involved in this integration:
1. S3 : Prepare input data set
2. Athena : Create and configure Amazon Athena service
3. SAP HANA Cloud : Create the connection to Athena service
4. Prepare virtual tables for Athena data and local table for SAP HANA Cloud data and create join queries
1. S3 : Prepare input data set
1.2 Once the CSV file is uploaded, go to properties tab of the object and copy the S3 folder path
2. Athena : Create and Configure Amazon Athena service
2.1 Go to Services –> Athena and click on Get Started button.
2.2 Click Connect data source.
2.3 In the next screens, continue with the default selections. In the Glue Catalog screen, you can choose to let the Glue crawler automatically scan and create table definitions for S3 Data in Athena, or you can choose to define the tables manually.
2.4 Point the input data set to the S3 bucket folder . Choose input data format (csv) in Step 2.
2.5 In Step3, you can choose to bulk add the columns by defining the columns in comma separated <columnname> <datatype>, format
2.6 Now create and assign a new “Workgroup” to this database schema. (By default it is set to “primary”.) A workgroup in Athena is used to isolate query list and query history and groups queries for easy cost constraint enforcements.
2.7 Schema and Table definitions are reflected in Athena and a Query editor is made available, to query on the source data from S3 using SQL.
2.8 Now that Athena Tables are defined, let’s prepare for the SAP HANA Cloud integration .
2.8.1 Since the corporate user for our AWS account has multi-factor authentication enabled, let’s create a new IAM user and give it all permissions needed to access Athena and S3 . This is done by assigning the below mentioned policies to that IAM user.
2.8.2 Download AWS root CA as .pem file from here .
2.8.3 Download S3 bucket’s root CA (Baltimore CyberTrust Root) by bringing up Object URL of the S3 data store in browser and downloading the certificate.
3. SAC HANA Cloud : Create the connection to Athena service
3.1 Log In to the SAP HANA Cloud instance database explorer as DBADMIN and follow the steps mentioned in this official SAP Help document to add the two AWS certificates we downloaded above in 2.7.2. This helps AWS trust the integration source system that calls the Athena APIs.
3.2 . In the SAP HANA Cloud dsatabase explorer, Expand the catalog folder of the database and right click on remote sources to add a remote source.
3.3 Add the remote source by entering the required details:
For Region enter the region of Athena service. For workgroup enter the name of the Workgroup we created in step 2.5 above . For Access and Secret Key , enter the Access and Secret Key that we created for the IAM user in step 2.7.1 above.
Click Create to finish creating the connection.
4. Prepare Virtual tables for Athena data and Local table for Hana Data and Create Join Queries
4.1 Click on the newly created remote source, select the schema (this is the Athena database schema), choose TABLE as Type and then click search. It should bring the names of the tables from Athena and list it as shown. Select a table and click on Create Virtual Objects. The standard naming convention for virtual tables is to prefix it with a v_ in front of the table name. (eg: v_employee)
4.2 Create a local/SAP HANA Cloud table and insert data into it, so it can be joined with the virtual table in Athena.
4.3 Create a SQL view by joining the two tables on a key column.
Every time the federated query is run (i.e., the virtual table is accessed) in SAP HANA Cloud, it creates an entry in the query history (Click on “History” tab) in Athena and is trackable !
In summary, by federating data from external sources like Amazon Athena , it is possible to derive more valuable insights by combining it data inside SAP without the need to replicate data. This helps centralize your data, frees up workload on systems thereby allowing efficient analysis of data.
Thanks for reading. If you have any questions, please reach out to firstname.lastname@example.org.