Skip to Content
Technical Articles
Author's profile photo Werner Dähn

Hana SDI Adapter for Azure Data Lake

Design goal

When dealing with lots of data, there are different concepts. SAP favors a multi-temperature storage inside the database but the majority of projects use an object storage – in case of Microsoft Azure that would be the Azure Data Lake. It is cheap, can grow to virtually any size, is secure and can handle all types of data.

To make this data accessible via Hana also, a new SDI adapter is needed. I wrote one that provides access to the data in Azure Data Lake and Azure Service Bus. With this adapter the data can be used by Hana onPrem, Hana Cloud, SAP Hana Data Warehouse Cloud, S/4Hana or SAP BW.

Regarding the payload formats supported, in Azure Avro and Json are popular formats. In below example the Data Lake contains millions of files, each representing one patient record in the FHIR standard. Whenever new files are uploaded to Azure, the Azure Event Grid posts this information into a Service Bus queue which the adapter is listening on. But the adapter works with any Json or Avro format, no limitations.

My sample file in Azure looks like this

{
  "resourceType": "Patient",
  "id": "example",
  "identifier": [
    { "use": "usual", "system": "urn:oid:1.2.36.146.595.217.0.1" }
  ],
  "active": true,
  ..
  ..

and all files should be queried via Hana like any other table.

 

Requirements

 

This creates a couple of requirements for this Data Lake adapter:

  • Parse Json and Avro files.
  • As these file formats allow for nested data, a mechanism to blend the nested Json/Avro data into the relational Hana data model must exist.
  • When files are changed, deleted or new files arrive they must be visible in Hana right away.
  • One Data Lake storage has different kind of files, each with their unique structure, which must be treated independently.

The solution is is using the concept of “views” in the Azure Data Lake adapter to define transformation from the nested data into Hana tables. For above example the user does create a file named patient.view with the following content:

{
	"namefilter": "patient.*\\.json",
	"from": "$",
	"columns": [
		{
			"accessor": "$.id",
			"alias": "ID",
			"datatype": "VARCHAR(256)"
		},
		{
			"accessor": "$.identifier[0].system",
			"alias": "SYSTEM_IDENTIFIER",
			"datatype": "VARCHAR(40)"
		},
		{
			"accessor": "$.gender",
			"alias": "GENDER",
			"datatype": "VARCHAR(10)"
		},
		{
			"accessor": "$.birthDate",
			"alias": "BIRTHDATE",
			"datatype": "DATE",
			"format": "yyyy-M-d"
		}
	],
	"queuename": "filechanges"
}

Apart from the view projection definitions (from and columns) with their information about:

  • What nesting level is driving the row generation? from: $ means the root level, so one row per file.
  • Which field to output?
  • What should be its Hana column name?
  • What should be its Hana data type?
  • What is the format of the value in case a data type conversion is needed, e.g. from String to Date?

a name and directory filter can be specified also. Because this format definition makes sense for json files containing patient data only and all these files follow the pattern patient*.json in the Data Lake storage, a namefilter is used. Hence this view should consider only those files.

The last setting in above example is the Enterprise Service Bus (“ESB”) queue name where Azure puts all file changes. This is another feature of the Azure Data Lake. It allows to trigger events whenever a file is modified and using Azure Event Grid, this event can be put into an ESB queue.

 

 

Consuming Azure Data Lake data

This view is imported as virtual table just like with any other SDI adapter, either via SQL or via the various UIs. As the definition file is called patient.view its remote name is patient and the chosen name v_patient. In above screenshot this table was selected and it returned the data contained in the files.

 

Creating the Hana remote source object

The remote source definition asks for the URL of the Data Lake endpoint and optionally – if realtime subscriptions will be used also – for the Enterprise Service Bus connection string. Both values can simply be copied from Azure.

 

As a Storage Account in Azure can have multiple containers, the file system is a mandatory parameter and it can be augmented with additional sub directories. For example the setting /main/FHIR configures this remote source to read only data from the file system called “main” and within the “FHIR” directory and sub directories only.

The security keys for the Storage Account and the Service Bus are placed into the secure credential storage.

The only additional parameter needed is a Adapter-local directory where all view definition files are placed by the user.

 

Realtime Subscriptions

Using Hana SDI realtime subscriptions, changes in the Data Lake are captured and the target table in Hana gets updated with the new data immediately.

CREATE COLUMN TABLE t_patient LIKE v_patient;

CREATE REMOTE SUBSCRIPTION s_patient 
ON v_patient TARGET TABLE t_patient;

ALTER REMOTE SUBSCRIPTION s_patient QUEUE;

INSERT INTO t_patient select * from v_patient; // Initial load

ALTER REMOTE SUBSCRIPTION s_patient DISTRIBUTE;

SELECT * FROM t_patient; // Returns old and changed file contents

 

 

The corresponding setup in Azure is very easy to achieve. All that needs to be done is to define a new Event in the File System and use a previously created Service Bus queue as end point.

 

Summary

With that not only can all Azure Data Lake data be read in a federated mode but also the data be cached in Hana tables for best query performance and this data is current always – thanks to the realtime subscriptions. These changes are even transactional consistent.

 

 

Assigned tags

      21 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Mario De Felipe
      Mario De Felipe

      Thanks for sharing, I believe we will see an increasing interest in real-time integration, so the more we know about this, the better we'll get there.

      Author's profile photo Michael Kaufmann
      Michael Kaufmann

      Hi Werner,

      great blog!

      If one wants to establish this scenario with SAP HANA Cloud and Azure Data Lake, I guess there is still the DP Agent needed? In this case it would not make much sense to go from Cloud to On- premise to Cloud again. Therefore: Is it possible to install DP-Agent on a VM in Azure?

      Regards,
      Michael

      Author's profile photo Werner Dähn
      Werner Dähn
      Blog Post Author

      Michael Kaufmann Yes, of course. Any small Linux VM will do.

      Would be nice if each adapter can be packaged as docker image with dpagent included already, wouldn't it? I will make the suggestion to SAP.

      Author's profile photo Simon Vandelanotte
      Simon Vandelanotte

      Great find & great use-case!

       

      I don't find any mention of this adapter when looking into the PAM documentation or Installation guide for SDI. Is there any documentation on this adapter?

      Author's profile photo Werner Dähn
      Werner Dähn
      Blog Post Author

      Simon Vandelanotte If you can send me an email and I will grant you access to the github repo.

      Note: This is not a SAP product, I wrote the adapter for a customer and others can participate on that.

      Author's profile photo Harikishore Sreenivasalu
      Harikishore Sreenivasalu

      Dear Werner

      we have a use case as well for the Azure Data Lake, is it possible to have the AzureDataLake adapter in HANA SDI ?

      regards

      Harikishore

      Author's profile photo Werner Dähn
      Werner Dähn
      Blog Post Author

      Harikishore Sreenivasalu If you send me an email I will grant you access to the github repo.

      Author's profile photo Ugandhar Nandam
      Ugandhar Nandam

      Werner

      We have a use case as well for the Azure Data Lake, is it possible to have the AzureDataLake adapter in HANA SDI? My email id: ugandhar.nandam@gmail.com

      Author's profile photo ramesh k
      ramesh k

      Hi Werner ,

      We have similar use case for Azure Data Lake adapter in HANA SDI . Could you please provide the installation reference. my mail id (needara2021@gmail.com)

      Author's profile photo MARK IVANOWSKI
      MARK IVANOWSKI

      We need help with the code to have an adapter for AZURE Data Lake in SAP Commissions.

      my email marki@asardigital.com

      Author's profile photo MARK IVANOWSKI
      MARK IVANOWSKI

      Dear Werner Dähn

      We need help with the code to have an adapter for AZURE Data Lake in SAP Commissions on HANA.

      my email marki@asardigital.com

       

      Author's profile photo Aristobulo Cruz
      Aristobulo Cruz

      Hi, please share the script for creating the remote source, acruzto2002@gmail.com

      Regards.

      Author's profile photo Werner Dähn
      Werner Dähn
      Blog Post Author

      Aristobulo Cruz I will bring the documentation up to standard in the next weeks.

       

      CREATE REMOTE SOURCE "xxxxx" ADAPTER "AzureSQLDB"
      AT LOCATION AGENT "yyyyy" CONFIGURATION
      	'<?xml version="1.0" encoding="UTF-8"?>
      	<ConnectionProperties name="azuredb">
      		<PropertyEntry name="jdbcurl">jdbc:sqlserver://zzzzz.database.windows.net:1433;database=rtditest1;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;</PropertyEntry>
      		<PropertyEntry name="jdbcurlrt"></PropertyEntry>
      		<PropertyEntry name="pollsleep"></PropertyEntry>
      	</ConnectionProperties>'
      WITH CREDENTIAL TYPE 'PASSWORD' USING
        '<CredentialEntry name="credential">
      	<user>aaaaaa</user>
          <password>bbbbbb</password>
        </CredentialEntry>';
      Author's profile photo Aristobulo Cruz
      Aristobulo Cruz

      thanks Werner, you make other step before for creating the adapter?

       

      CREATE REMOTE SOURCE "AzureDataLakeAdapter" ADAPTER "AzureSQLDB"
      AT LOCATION AGENT "DPA_HNPRD" CONFIGURATION
      '<?xml version="1.0" encoding="UTF-8"?>
      <ConnectionProperties name="azuredb">

       

      Error: (dberror) [474]: invalid adapter name: AzureSQLDB:

       

       

      Author's profile photo Werner Dähn
      Werner Dähn
      Blog Post Author

      Yes, there are the usual SDI steps to create an agent, adapter and adapter location.

       

      see the first two commands here:

      https://blogs.sap.com/2015/08/20/hana-adapter-sdk-interaction-via-sql/

       

      The adapter name of AzureSQLDB is fixed, you want to enable this exact adapter.

      Author's profile photo Aristobulo Cruz
      Aristobulo Cruz

      Hi Werner, yes I can do it, but if the name adapter ist'n register into agent, doesn't works the creation script adapter.

      CREATE ADAPTER "AzureSQLDB" PROPERTIES 'display_name=Azure Adapter;description=Azure Adapter' AT LOCATION DPSERVER;

       

      Could not execute 'CREATE ADAPTER "AzureSQLDB" PROPERTIES 'display_name=Azure Adapter;description=Azure Adapter' AT ...'
      Error: (dberror) [403]: internal error: Cannot get adapter capabilities: exception 151002: Adapter type AzureSQLDB not registered.

      Author's profile photo Werner Dähn
      Werner Dähn
      Blog Post Author

      You have created an agent called DPAGENT that points to the dpagent installation where the adapter is deployed, yes?

       

      In that case, the SQL to execute would be

      CREATE ADAPTER "AzureSQLDB" AT LOCATION AGENT "DPAGENT";

      This contacts the DPAGENT and queries the metadata of the AzureSQLDB adapter, which was deployed there by you (via the agentconfig tool). I am a bit confused because in the create-remote-source you used as agent location "DPA_HNPRD".

      Author's profile photo Werner Dähn
      Werner Dähn
      Blog Post Author

      And I made a mistake as well. Are we talking about AzureDataLake or AzureSQLDB. The first is about files in the Azure Blob Storage, the second a SQL Server Database.

      Above you wrote in one SQL statement AzureSQLDB and that's the first I saw.

      The AzureDataLakeAdapter has a quite comprehensive documentation, I thought.

      e.g. https://github.com/rtdi/HanaSDIAzureDataLakeAdapter/blob/master/docs/Installation.md

       

       

      CREATE REMOTE SOURCE "DATALAKE_RTDI" ADAPTER "AzureDataLakeAdapter"
      AT LOCATION AGENT "desktop" CONFIGURATION
      	'<?xml version="1.0" encoding="UTF-8"?>
      	<ConnectionProperties name="azuredatalake">
      	  <PropertyEntry name="DataLakeEndpoint">https://xxxx.dfs.core.windows.net/</PropertyEntry>
      	  <PropertyEntry name="RootDir">/main</PropertyEntry>
      	  <PropertyEntry name="DefinitionDirectory">c:\temp\ADL\views</PropertyEntry>
      	  <PropertyEntry name="ServiceBusConnectionString">Endpoint=sb://yyyyy.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey</PropertyEntry>
      	</ConnectionProperties>'
      WITH CREDENTIAL TYPE 'PASSWORD' USING
        '<CredentialEntry name="StorageKey">
          <password>bbbb</password>
        </CredentialEntry>
        <CredentialEntry name="ServiceBusKey">
          <password>aaaa</password>
        </CredentialEntry>';
      Author's profile photo Aristobulo Cruz
      Aristobulo Cruz

      Hi Werner, I have a data provisioning agent DPA_HNPRD, I need the AzureDataLakeAdapter for a DEMO  with json file.

       

      Please confirm the documentation I get error with this url:

       

      https://github.com/rtdi/HanaSDIAzureDataLakeAdapter/blob/master/docs/Installation.md

       

      so thanks.

       

      Author's profile photo Werner Dähn
      Werner Dähn
      Blog Post Author

      You understoof that this adapter is something I wrote, you must install in SDI and I must grant you access to the repository in github to download it?

      Author's profile photo Gilles Van den Bleeken
      Gilles Van den Bleeken

      Hi Werner,

       

      Thank you for this nice blog. I also have a use case to connect to an Azure Datalake via SDI, could you please provide the installation reference?

       

      Kr,

      Gilles (gilles.vandenbleeken@ontexglobal.com)