Technical Articles
Your SAP on Azure – Part 14 – Export SAP data to Azure Data Lake using SAP Data Hub and visualize them with Microsoft PowerBI
A few weeks ago, I described the installation process of SAP Data Hub. Today I would like to present a sample use case – extract the SAP ERP data to Azure Data Lake Storage and use Microsoft PowerBI to visualize them. I don’t think more introduction is required – let’s jump straight to configuration!
PROVISION DATA LAKE STORAGE
Azure Data Lake is a highly scalable and cost-efficient storage solution for big data analytics. You can use it to capture data from various sources no matter how structured they are. The storage is part of the Azure Platform-as-a-Service offering, is highly available, and can store petabytes of data. Together with Microsoft PowerBI they create a great solution for data analytics and exploration.
Let’s start with creating a Service Principal, which is a special security identity that is used in service-to-service communication when the process is not executed by any person. SAP Data Hub will use those credentials to write files to Data Lake storage. I have prepared a PowerShell script that creates the Service Principal and lists all parameters that will be required during the connection configuration:
$app = New-AzureRmADApplication -DisplayName dataextractSP -IdentifierUris "http://dataextract"
$servicePrincipal = New-AzureRmADServicePrincipal -ApplicationId $app.ApplicationId
$BSTR = [System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($servicePrincipal.Secret)
$password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($BSTR)
[Runtime.InteropServices.Marshal]::ZeroFreeBSTR($BSTR)
$tenant = Get-AzureRmTenant
Write-Host ''
Write-Host 'Below information will be required during SAP Data Hub configuration:'
Write-Host 'Service Principal Tenant ID :' $tenant.Id
Write-Host 'Service Principal Client ID :' $app.ApplicationId
Write-Host 'Service Principal Key :' $password
The provisioning process for Azure Data Lake storage is amazingly simple. In Azure Portal find the Data Lake Storage and choose the desired name and resource group. That’s all!
Within a few minutes, the resource is provisioned, and we can proceed with ACL configuration. Azure Data Lake uses POSIX access control model. Each object in the storage has three permissions: Read, Write and Execute. SAP Data Hub will use the previously created Service Principal to write data to the storage, so we need to grant correct permissions.
Open the Data Lake blade and go to Data Explorer. Click on Access and select Read/Write/Execute permissions. Please enlarge below picture for reference:
The Azure configuration is completed. You can log in to your SAP Data Hub tenant.
CREATE CONNECTIONS IN SAP DATA HUB
Connection configuration is a straightforward process. To start, open the Connection Management tile.
I have prepared two connections: one for my SAP NetWeaver system and one for Azure Data Lake.
The account name in Data Lake connection is the Azure Data Lake name that you chose earlier.
After saving the entries, perform the connection test.
BUILD THE PIPELINE
Go back to the main screen of SAP Data Hub and open the Modeler. Click on the + button to create a new graph. Our pipeline won’t be complex – we just need three elements:
- SAP ABAP ODP Object – to connect to SAP and read data
- Flowagent File Producer – to write file to Azure Data Lake
- Graph Terminator – to finish the task
Each component requires configuration. In the SAP ABAP ODP Object select the connection that should be used and choose from which object you’d like to get the data. I’m using the EPM sample data model and I chose the SEPM_IPRODUCT object that will give me detailed information about items in the Product Catalog.
In the Flowagent settings select the ADL as the storage type and choose the connection to Azure Data Lake. You can also decide about the file format and the location where it should be saved.
You can now save and execute the graph. After a few seconds, the status changed to Running. SAP Data Hub has created a new pod in Kubernetes cluster that processes the task.
You can see the pods status in Kubernetes (you don’t really have to, I’m just posting it in case you are curious ?):
kubectl get pods -n datahub
It took less than 5 minutes to execute the graph. You can verify the existence of the file using for example Storage Explorer:
ANALYZE DATA IN MICROSOFT POWERBI
Let’s be honest – reading the data directly from CSV file is not the best idea. You can use it if the file is small and you know what you’re looking for. But in the case of large datasets using some sort of analytics software is a much better idea. I use a free edition of Microsoft PowerBI which is an easy way to create a visual representation of data. Today I will show you just a small piece of PowerBI capabilities by creating a sample chart with the number of products in product categories. When the software is installed, click on the Get Data button in the menu and select Azure Data Lake:
On the next window enter the Data Lake URL:
You’ll be asked to Sign-In (use your personal credentials, not the Service Principal):
The file created by SAP Data Hub is automatically detected, so we just have to confirm it by clicking the Load button:
Now just a few clicks to format the data (yes, PowerBI does it automatically) and we can see the file content:
In the report builder choose the Chart Type and that’s pretty much all we have to do. The entire process of getting the data and creating visualization takes just a few minutes – I think it’s even easier than Excel (but the chart looks much better)
You can save the report and use it again after the data refresh! I’m going to write a separate post about PowerBI capabilities, so if you are interested don’t forget to follow me on SAP Community and LinkedIn to get updates about new posts!
Hi Bartosz Jarkowski ,
Good to see actual step by step guide which i was bit a clue how to do it. The steps of run the script, my local PC CLI or Cloudshell show the below error. Any tips what I need to look at ?
PS C:\Users\i032156> $app = New-AzureRmADApplication -DisplayName dataextractSP -IdentifierUris "http://dataextract"
New-AzureRmADApplication : No account found in the context. Please login using Connect-AzureRmAccount.
At line:1 char:8
+ $app = New-AzureRmADApplication -DisplayName dataextractSP -Identifie ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : CloseError: (:) [New-AzureRmADApplication], ArgumentException
+ FullyQualifiedErrorId : Microsoft.Azure.Commands.ActiveDirectory.NewAzureADApplicationCommand
Hello Chan Jin Park ,
the error you have described is caused because you are not logged in to your Azure account from PowerShell.
Please start with logging in:
Connect-AzureRmAccount
Hi,
we have configured a connection to ADL (gen 1) like in the blog. The connection status check gives an ok and all the files are visible in the connection explorer on datahub.
However the creation of a graph using the flowagent file producer results in an docker error all the time:
Error starting graph with handle [02982c14a7094e2cbdc0e7231e324a11]: error building docker image. Docker daemon error: The command '/bin/sh -c apt-get update && apt-get install -yq --no-install-recommends dos2unix=7.3.4-3 libaio1=0.3.110-3 libnuma1=2.0.11-2.1 locales=2.24-11+deb9u3 libfilter-perl=1.57-1 libkeyutils1=1.5.9-9 libssl1.0.2 unixodbc-dev=2.3.4-1 unixodbc=2.3.4-1 odbcinst1debian2=2.3.4-1 && apt-get clean && rm -rf /var/lib/apt/lists/* && echo en_US.UTF-8 UTF-8 >> /etc/locale.gen && locale-gen && mkdir -p $LINK_DIR/bin && chmod +x $FLOW_AGENT_DIR/*.sh && sync && $FLOW_AGENT_DIR/configure.sh && ln -s $FLOW_AGENT_DIR/sapjvm $LINK_DIR/ext/jre && mkdir $LINK_DIR/lock && ln -s $FLOW_AGENT_DIR/hdbclient $HANA_CLIENT_DIR' returned a non-zero code: 100
Using the standard file write also does not help since we get a "bad request" for all operations.
Any help is appreciated !
I know this won't be the correct answer, but there is a SAP Note describing a similar issue:
2688925 - Error starting graph with handle: error building docker image - SAP Data Hub
It is suggested to check the vflow logs to investigate further - could you have a look at them and maybe attach them here?
Also I think it would be beneficial to post this question in the Q&A area as well - more people will see it 🙂
Hi Marcus,
I know you have probably solved your problem, but I just recently run into similar issue and the solution for me was in note:
2757817 - SAP Data Hub 2: Error building docker image. Version '2.24-11+deb9u3' for 'locales' was not found
Hello Bartosz,
May I ask a question with you?
why not use the HANA platform and SAC to solve it? Is anything special about Azure?
Thank you!
Hello Max,
there is usually more than a single solution for a problem. You are suggesting SAP Analytics Cloud, someone else would refer QlikView.
I think it's important to understand what customer really need and find a solution that suits. I wouldn't like to start discussion about which product is better, I don't think it makes sense 🙂
Hello Bartosz Jarkowski
Great blog series, thanks a lot. Can I ask if you would have any recommendation for creating a data pipeline to Azure SQL Data Warehouse? Can we use the Azure SQL connection type or shall we go with a more generic connection type?
Kind Regards
Hi Serdar,
I'm sorry, I haven't tested connection between SAP Data Hub and Azure SQL Data Warehouse so I can't share my experience. But I'd give it a try with the Azure SQL connector.
Please share your testing outcomes, this would be beneficial for everyone!
Thank Bartosz for the sharing.
As you know, to copy data from SAP ECC to Azure Storage, we can also use Data Factory to do. Could you explain more the Pros and Cons of using ADF vs SAP Data Hub?
Which typical use cases, SAP Data Hub is more suitable?
Thank you so much.
Trung.
It's quite important to understand your use cases before choosing the right tool to do the job. Then you should check which tool performs better, having in mind what it offers and how much does it cost. Unfortunately there is no golden bullet that will answer all questions 🙂
Hello Trung Phan,
There is also the option to use SAP IQ to load Data from SAP ECC into SAP IQ. See the Blog - SAP (Sybase) IQ – the hidden treasure …
Furthermore - SAP Data Intelligence Hub – connecting the Dots …
Best Regards Roland
Can we use DATA HUB to export data in Tables from S4HANA Cloud to Azure data lake?
Nice article, is the data transfer to ADL from SAP hub can be configured as a continuous replication or is it one time export / transfer? We are working on a real-time data transfer from sap to azure data lake..
Hi, the solution described is just for one time extraction. For continuous replication you should look at the integration with SAP SLT - there is a couple blogs about it:
https://blogs.sap.com/2019/10/29/abap-integration-replicating-tables-into-sap-data-hub-via-sap-lt-replication-server/