Data Coherence Between SAP HANA and Microsoft Azure – Part 3: Data Federation Between Azure Storage and SAP HANA Cloud Through Azure HDInsight And SAP HANA Smart Data Integration Hive Adapter
Regardless of where your data is, there is a method to move your data into SAP HANA. Our Customer Innovation team has compiled this blog series to provide step-by-step instructions for moving your data.
This is the third part in our Data Coherence Series, you can find the others here:
Part 1 – Transfer data from Azure Blob to SAP HANA Cloud using SAP Data Intelligence
Part 2 – Transfer data from Azure Data Explorer to SAP HANA Cloud using SAP Data Intelligence
In this installment, we will show you how to create a data federation between Microsoft Azure Storage and SAP HANA Cloud using Azure HDInsight and SAP HANA Smart Data Integration(SDI) Hive Adapter. When researching solutions within SAP channels, we found that there was no standard process for addressing this, so we wrote this article in hopes that people in a similar situation will find it useful.
The Azure IoT data is a setting in Azure Data Lake Storage Gen2 and the external table for these IoT data is created in the Azure Interactive Query Cluster. SAP HANA Cloud accesses those IoT data through Hive Adapter in SAP Data Provisioning Agent (SAP DP Agent) without any data replication.
System Architecture Diagram:
Here are the main steps to set up this data flow:
- CreateAzure Data Storage Account and container:
This account should be set as StorageV2 and the hierarchical namespace should be enabled.
The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access.
2.Store Azure IoT data in Azure Data Lake Storage Gen2:
The IoT data should be organized into the hierarchy directories. It will help Hive to define the table with partitions and gain the better query performance.
- Set up Azure HDInsightCluster:
There are several types of HDInsight clusters that support Hive. Here you should select Interactive Query cluster(LLAP) since Interactive Query supports in-memory caching, which makes Apache Hive queries faster and much more interactive. For more detailed information please visit this site:
4.Create an external table for IoT data in Interactive Query cluster:
The table is partitioned by the following hierarchical directories in ADSL Gen2: year,month,day,and hour.
Location should be pointed to the container which stores IoT data and the Azure Blob File System (abfs)protocol is supported in Hive cluster.
This is a SQL example to create external table in Hive :
CREATE EXTERNAL TABLE COOLER( EnqueuedTimeUtc int, connectionDeviceId varchar(20), temperature double, humidity double, powerState int, coolerState int, doorOpenCount int, doorCloseCount int, doorOpenTime int) PARTITIONED BY (year int, month int, day int, hour int) STORED AS PARQUET LOCATION 'abfs://email@example.com/';
5.Register Hive adapter in DP Agent Configuration Tool:
Using the User DP Agent Configuration Tool to register Hive adapter to SAP HANA Cloud.
6.Create remote source in SAP HANA Cloud:
Using SAP HANA Database Explorer to create a remote source for Azure Interactive query cluster. Pick HiveAdapter in Adapter Name droplist.
7.Createa virtual object for the IoT data in Azure storage.
Create a virtual table v_cooler in schema DBADMIN from the hive table cooler that points to the IoT data in Azure storage.
8.Query virtual table:
This virtual table can be queried directly or it can be used in: view/calculation view, among other places.
This query against this virtual table will be pushed down to the Interactive Query cluster through the SAP DP Agent and it also brings the query result back to SAP HANA Cloud for further use.
By following these steps, you’ll be able to set up your own data federation between Microsoft Azure Storage and SAP HANA Cloud using Azure HDInsight and SAP SDI Hive Adapter.
In conclusion, there are two advantages to creating a data federation. The first is that it does not replicate data, freeing up resources and time and the second is that a data federation centralizes your data, which reduces the workload on your systems and allows for greater analysis.
Thanks for reading. If you have any questions, please reach out to firstname.lastname@example.org.
Interesting. Thanks for sharing.