Your SAP on Azure – Part 14 – Export SAP data to Azure Data Lake using SAP Data Hub and visualize them with Microsoft PowerBI
A few weeks ago, I described the installation process of SAP Data Hub. Today I would like to present a sample use case – extract the SAP ERP data to Azure Data Lake Storage and use Microsoft PowerBI to visualize them. I don’t think more introduction is required – let’s jump straight to configuration!
PROVISION DATA LAKE STORAGE
Azure Data Lake is a highly scalable and cost-efficient storage solution for big data analytics. You can use it to capture data from various sources no matter how structured they are. The storage is part of the Azure Platform-as-a-Service offering, is highly available, and can store petabytes of data. Together with Microsoft PowerBI they create a great solution for data analytics and exploration.
Let’s start with creating a Service Principal, which is a special security identity that is used in service-to-service communication when the process is not executed by any person. SAP Data Hub will use those credentials to write files to Data Lake storage. I have prepared a PowerShell script that creates the Service Principal and lists all parameters that will be required during the connection configuration:
$app = New-AzureRmADApplication -DisplayName dataextractSP -IdentifierUris "http://dataextract" $servicePrincipal = New-AzureRmADServicePrincipal -ApplicationId $app.ApplicationId $BSTR = [System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($servicePrincipal.Secret) $password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($BSTR) [Runtime.InteropServices.Marshal]::ZeroFreeBSTR($BSTR) $tenant = Get-AzureRmTenant Write-Host '' Write-Host 'Below information will be required during SAP Data Hub configuration:' Write-Host 'Service Principal Tenant ID :' $tenant.Id Write-Host 'Service Principal Client ID :' $app.ApplicationId Write-Host 'Service Principal Key :' $password
The provisioning process for Azure Data Lake storage is amazingly simple. In Azure Portal find the Data Lake Storage and choose the desired name and resource group. That’s all!
Within a few minutes, the resource is provisioned, and we can proceed with ACL configuration. Azure Data Lake uses POSIX access control model. Each object in the storage has three permissions: Read, Write and Execute. SAP Data Hub will use the previously created Service Principal to write data to the storage, so we need to grant correct permissions.
Open the Data Lake blade and go to Data Explorer. Click on Access and select Read/Write/Execute permissions. Please enlarge below picture for reference:
The Azure configuration is completed. You can log in to your SAP Data Hub tenant.
CREATE CONNECTIONS IN SAP DATA HUB
Connection configuration is a straightforward process. To start, open the Connection Management tile.
I have prepared two connections: one for my SAP NetWeaver system and one for Azure Data Lake.
The account name in Data Lake connection is the Azure Data Lake name that you chose earlier.
After saving the entries, perform the connection test.
BUILD THE PIPELINE
Go back to the main screen of SAP Data Hub and open the Modeler. Click on the + button to create a new graph. Our pipeline won’t be complex – we just need three elements:
- SAP ABAP ODP Object – to connect to SAP and read data
- Flowagent File Producer – to write file to Azure Data Lake
- Graph Terminator – to finish the task
Each component requires configuration. In the SAP ABAP ODP Object select the connection that should be used and choose from which object you’d like to get the data. I’m using the EPM sample data model and I chose the SEPM_IPRODUCT object that will give me detailed information about items in the Product Catalog.
In the Flowagent settings select the ADL as the storage type and choose the connection to Azure Data Lake. You can also decide about the file format and the location where it should be saved.
You can now save and execute the graph. After a few seconds, the status changed to Running. SAP Data Hub has created a new pod in Kubernetes cluster that processes the task.
You can see the pods status in Kubernetes (you don’t really have to, I’m just posting it in case you are curious 😊):
kubectl get pods -n datahub
It took less than 5 minutes to execute the graph. You can verify the existence of the file using for example Storage Explorer:
ANALYZE DATA IN MICROSOFT POWERBI
Let’s be honest – reading the data directly from CSV file is not the best idea. You can use it if the file is small and you know what you’re looking for. But in the case of large datasets using some sort of analytics software is a much better idea. I use a free edition of Microsoft PowerBI which is an easy way to create a visual representation of data. Today I will show you just a small piece of PowerBI capabilities by creating a sample chart with the number of products in product categories. When the software is installed, click on the Get Data button in the menu and select Azure Data Lake:
On the next window enter the Data Lake URL:
You’ll be asked to Sign-In (use your personal credentials, not the Service Principal):
The file created by SAP Data Hub is automatically detected, so we just have to confirm it by clicking the Load button:
Now just a few clicks to format the data (yes, PowerBI does it automatically) and we can see the file content:
In the report builder choose the Chart Type and that’s pretty much all we have to do. The entire process of getting the data and creating visualization takes just a few minutes – I think it’s even easier than Excel (but the chart looks much better)
You can save the report and use it again after the data refresh! I’m going to write a separate post about PowerBI capabilities, so if you are interested don’t forget to follow me on SAP Community and LinkedIn to get updates about new posts!