Skip to Content
Technical Articles

Your SAP on Azure – Part 14 – Export SAP data to Azure Data Lake using SAP Data Hub and visualize them with Microsoft PowerBI

A few weeks ago, I described the installation process of SAP Data Hub. Today I would like to present a sample use case – extract the SAP ERP data to Azure Data Lake Storage and use Microsoft PowerBI to visualize them. I don’t think more introduction is required – let’s jump straight to configuration!

PROVISION DATA LAKE STORAGE

Azure Data Lake is a highly scalable and cost-efficient storage solution for big data analytics. You can use it to capture data from various sources no matter how structured they are. The storage is part of the Azure Platform-as-a-Service offering, is highly available, and can store petabytes of data. Together with Microsoft PowerBI they create a great solution for data analytics and exploration.

Let’s start with creating a Service Principal, which is a special security identity that is used in service-to-service communication when the process is not executed by any person. SAP Data Hub will use those credentials to write files to Data Lake storage. I have prepared a PowerShell script that creates the Service Principal and lists all parameters that will be required during the connection configuration:

$app = New-AzureRmADApplication -DisplayName dataextractSP -IdentifierUris "http://dataextract"
$servicePrincipal = New-AzureRmADServicePrincipal -ApplicationId $app.ApplicationId
$BSTR = [System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($servicePrincipal.Secret)
$password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($BSTR)
[Runtime.InteropServices.Marshal]::ZeroFreeBSTR($BSTR)
$tenant = Get-AzureRmTenant

Write-Host ''
Write-Host 'Below information will be required during SAP Data Hub configuration:'
Write-Host 'Service Principal Tenant ID :' $tenant.Id
Write-Host 'Service Principal Client ID :' $app.ApplicationId
Write-Host 'Service Principal Key :' $password

The provisioning process for Azure Data Lake storage is amazingly simple. In Azure Portal find the Data Lake Storage and choose the desired name and resource group. That’s all!

Within a few minutes, the resource is provisioned, and we can proceed with ACL configuration. Azure Data Lake uses POSIX access control model. Each object in the storage has three permissions: Read, Write and Execute. SAP Data Hub will use the previously created Service Principal to write data to the storage, so we need to grant correct permissions.

Open the Data Lake blade and go to Data Explorer. Click on Access and select Read/Write/Execute permissions. Please enlarge below picture for reference:

The Azure configuration is completed. You can log in to your SAP Data Hub tenant.

CREATE CONNECTIONS IN SAP DATA HUB

Connection configuration is a straightforward process. To start, open the Connection Management tile.

I have prepared two connections: one for my SAP NetWeaver system and one for Azure Data Lake.

The account name in Data Lake connection is the Azure Data Lake name that you chose earlier.

After saving the entries, perform the connection test.

BUILD THE PIPELINE

Go back to the main screen of SAP Data Hub and open the Modeler. Click on the + button to create a new graph. Our pipeline won’t be complex – we just need three elements:

  • SAP ABAP ODP Object – to connect to SAP and read data
  • Flowagent File Producer – to write file to Azure Data Lake
  • Graph Terminator – to finish the task

Each component requires configuration. In the SAP ABAP ODP Object select the connection that should be used and choose from which object you’d like to get the data. I’m using the EPM sample data model and I chose the SEPM_IPRODUCT object that will give me detailed information about items in the Product Catalog.

In the Flowagent settings select the ADL as the storage type and choose the connection to Azure Data Lake. You can also decide about the file format and the location where it should be saved.

You can now save and execute the graph. After a few seconds, the status changed to Running. SAP Data Hub has created a new pod in Kubernetes cluster that processes the task.

You can see the pods status in Kubernetes (you don’t really have to, I’m just posting it in case you are curious 😊):

kubectl get pods -n datahub

It took less than 5 minutes to execute the graph. You can verify the existence of the file using for example Storage Explorer:

ANALYZE DATA IN MICROSOFT POWERBI

Let’s be honest – reading the data directly from CSV file is not the best idea. You can use it if the file is small and you know what you’re looking for. But in the case of large datasets using some sort of analytics software is a much better idea. I use a free edition of Microsoft PowerBI which is an easy way to create a visual representation of data. Today I will show you just a small piece of PowerBI capabilities by creating a sample chart with the number of products in product categories. When the software is installed, click on the Get Data button in the menu and select Azure Data Lake:

On the next window enter the Data Lake URL:

You’ll be asked to Sign-In (use your personal credentials, not the Service Principal):

The file created by SAP Data Hub is automatically detected, so we just have to confirm it by clicking the Load button:

Now just a few clicks to format the data (yes, PowerBI does it automatically) and we can see the file content:

In the report builder choose the Chart Type and that’s pretty much all we have to do. The entire process of getting the data and creating visualization takes just a few minutes – I think it’s even easier than Excel (but the chart looks much better)

You can save the report and use it again after the data refresh! I’m going to write a separate post about PowerBI capabilities, so if you are interested don’t forget to follow me on SAP Community and LinkedIn to get updates about new posts!

5 Comments
You must be Logged on to comment or reply to a post.
  • Hi Bartosz Jarkowski ,

     

    Good to see actual step by step guide which i was bit a clue how to do it.  The steps of run the script, my local PC CLI or Cloudshell show the below error. Any tips what I need to look at ?

     

    PS C:\Users\i032156> $app = New-AzureRmADApplication -DisplayName dataextractSP -IdentifierUris “http://dataextract”
    New-AzureRmADApplication : No account found in the context. Please login using Connect-AzureRmAccount.
    At line:1 char:8
    + $app = New-AzureRmADApplication -DisplayName dataextractSP -Identifie …
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : CloseError: (:) [New-AzureRmADApplication], ArgumentException
    + FullyQualifiedErrorId : Microsoft.Azure.Commands.ActiveDirectory.NewAzureADApplicationCommand

  • Hi,

     

    we have configured a connection to ADL (gen 1) like in the blog. The connection status check gives an ok and all the files are visible in the connection explorer on datahub.

    However the creation of a graph using the flowagent file producer results in an docker error all the time:

    Error starting graph with handle [02982c14a7094e2cbdc0e7231e324a11]: error building docker image. Docker daemon error: The command ‘/bin/sh -c apt-get update && apt-get install -yq –no-install-recommends dos2unix=7.3.4-3 libaio1=0.3.110-3 libnuma1=2.0.11-2.1 locales=2.24-11+deb9u3 libfilter-perl=1.57-1 libkeyutils1=1.5.9-9 libssl1.0.2 unixodbc-dev=2.3.4-1 unixodbc=2.3.4-1 odbcinst1debian2=2.3.4-1 && apt-get clean && rm -rf /var/lib/apt/lists/* && echo en_US.UTF-8 UTF-8 >> /etc/locale.gen && locale-gen && mkdir -p $LINK_DIR/bin && chmod +x $FLOW_AGENT_DIR/*.sh && sync && $FLOW_AGENT_DIR/configure.sh && ln -s $FLOW_AGENT_DIR/sapjvm $LINK_DIR/ext/jre && mkdir $LINK_DIR/lock && ln -s $FLOW_AGENT_DIR/hdbclient $HANA_CLIENT_DIR’ returned a non-zero code: 100

     

    Using the standard file write also does not help since we get a “bad request” for all operations.

    Any help is appreciated !

    • I know this won’t be the correct answer, but there is a SAP Note describing a similar issue:

      2688925 – Error starting graph with handle: error building docker image – SAP Data Hub

      It is suggested to check the vflow logs to investigate further – could you have a look at them and maybe attach them here?

      Also I think it would be beneficial to post this question in the Q&A area as well – more people will see it 🙂

    • Hi Marcus,

      I know you have probably solved your problem, but I just recently run into similar issue and the solution for me was in note:

      2757817 – SAP Data Hub 2: Error building docker image. Version ‘2.24-11+deb9u3’ for ‘locales’ was not found