Your SAP on Azure – Part 21 – Stream SAP data into Azure Event Hub using SAP Data Hub
Azure EventHubs is a fully managed service that allows exchanging messages between systems in real-time. It’s capable of receiving and processing millions of events per second. SAP Data Hub doesn’t directly support Azure EventHubs, but the connection can be established using Kafka protocol. In today’s post, I would like to show you how to provision the Azure service and how to configure the connection. At the end, I will also include my ABAP system to stream SAP data.
CREATE EVENT HUB NAMESPACE
The Azure Event Hubs and Apache Kafka are very similar to each other. You can use the Azure Event Hubs from the Kafka endpoint with no code change. The main difference is that the Apache Kafka is a software that you can run wherever you choose, while the Azure Event Hubs is a cloud service which you don’t have to manage.
Kafka resource types can be mapped into Event Hub:
|Kafka resource||Event Hub resource|
|Consumer Group||Consumer Group|
The highest entity in the Azure Event Hubs is the Namespace. Each namespace contains one or more Event Hubs that translate into Topic in the Kafka world.
The easiest way to deploy the Event Hubs Namespace is to use the Azure Portal. To enable Kafka connections you need to choose a Standard pricing tier.
It takes a couple of minutes to deploy the resource.
The first thing we should do is to analyze the Firewall rules associated with the Namespace to enable connection from the SAP Data Hub. The communication can be limited to selected networks or even specific IP addresses. The firewall can also be disabled.
I allow only connections from the virtual network and the subnet where my SAP Data Hub is running.
Next, let’s create the Event Hub (a topic) that we will use for data streaming. An interesting option is to Capture all messages to cloud storage. I want to check this functionality and save all messages in the Data Lake.
If you’d like to capture messages as well, you need to assign permissions in the Data Lake for EventHubs. It should have Execute rights on the Data Lake root directory and full permissions for the directory where the messages will be stored.
To access the Event Hub from the SAP Data Hub we’ll require the Access Policy:
That’s all! The Azure configuration is done and we can define the connection in the SAP Data Hub. Click on the created access policy to display the Connection String which will be used to define connection in the SAP Data Hub.
Connection String: Endpoint=sb://sapehdemo.servicebus.windows.net/;SharedAccessKeyName=FullAccess;SharedAccessKey=lyvsmdc2feOT0oX08blSFGnzoFmYm3m7ardYYF2Jfdo=;EntityPath=testtopic
DEFINE SAP DATA HUB CONNECTION
The broker hostname is the connection endpoint. The username is always “$ConnectionString” (without the quotes). As the password use the entire connection string. In order to establish a secure connection, we’ll also require the Event Hub certificate. It’s not possible to download it from the Azure Portal, but we can use openssl libraries:
openssl s_client -showcerts -connect <EventHubs_Endpoint>:9093 </dev/null
Copy both certificates, save them as a single file and import into SAP Data Hub:
Create a new KAFKA type connection using previously collected data:
Kafka broker (Event Hubs endpoint): sapehdemo.servicebus.windows.net:9093
Group ID: 0
Kafka SASL username: $ConnectionString
Kafka SASL password (connection string): Endpoint=sb://sapehdemo.servicebus.windows.net/;SharedAccessKeyName=FullAccess;SharedAccessKey=lyvsmdc2feOT0oX08blSFGnzoFmYm3m7ardYYF2Jfdo=;EntityPath=testtopic
Use TLS: true
TLS CA file: /vrep/ca/eh.crt
SAP DATA HUB PIPELINE
The connection test for Kafka endpoints is not supported. We’ll verify the connection after building the pipeline in the Data Hub Modeler. I’m using the Message Generator to create content to be sent to the Event Hub using Kafka Producer operator. On the same graph, I also included Kafka Consumer that reads the messages from the queue and passes them to the Terminal.
For both Kafka Producer and Consumer operators select the previously created connection and enter the topic (Event Hub) name:
Now let’s start the pipeline and check if messages are flying. Use the terminal to verify the results:
It’s all good, the connection is working. In the Event Hub monitoring in the Azure Portal we can see statistics:
Using the Event Hub features the messages are also streamed to the Data Lake store.
But there one more thing I would like to present. Using the ABAP connector the SAP Data Hub can stream SAP data directly to the Event Hub. Let’s slightly modify the pipeline:
I pointed the ABAP operator to my test NetWeaver system with the EPM Demo data. Then I changed the batch size to 1 on the CSV producer. Each record read from the SAP system is now sent as a separate message to the Event Hub. We can see the data preview in the terminal:
Do you have some write up comparison to choose between Azure Data Factory or SAP Data Hub to push SAP data into Azure Data Lakes?
Any leads or best practices would be greatly help. It may be a batch data processing not necessarily real time stream data.
I'm sorry but I don't have any comparison between SAP DataHub and Azure Data Factory.
A great series of posts, Bartosz! We look forward to continuing 😉