Skip to Content
Technical Articles
Author's profile photo Bartosz Jarkowski

Your SAP on Azure – Part 13 – Install SAP Data Hub on Azure Kubernetes Service

SAP Data Hub is one of the newest SAP products that integrates data from multiple sources. It’s fully containerized which means that each application component runs in a separate environment. The architecture is completely different to SAP Netweaver, so there is a lot of learning ahead of us. If you are new to Docker and Kubernetes I highly recommend to read firstly an excellent introduction to SAP Data Hub containers written by Thorsten Schneider!

To run the SAP Data Hub you will require a Kubernetes service to manage the deployed containers. You can use an on-premise solution like SUSE CaaS, but I recommend having a look at the Azure Kubernetes Service and deploy the SAP Data Hub in the cloud!

In the following blog, I will present how to prepare all required Azure resources and install the SAP Data Hub product using Azure Container Service.

The guide is based on the SAP Data Hub 2.4. SAP is intensively working making things easier, so in future, some steps may be different or not required at all. Always check the current requirements in the installation guide!

PREPARE AZURE KUBERNETES SERVICE

I start with provisioning of the Kubernetes service, as it usually takes a while to be ready. Open the Azure portal and create a new Kubernetes Cluster. Following the installation guide, I select a VM size with 4 CPUs and 32 GB of memory. If you select a smaller node most likely you will encounter issues during containers deployment.

Be careful with choosing the Kubernetes version, as only some of them are supported by SAP Data Hub. Always check the documentation. The currently supported version includes 11.5 which I chose on the below screen.

On the Authentication tab I enabled the Role-Based Access Control which is a more secure way to run the Kubernetes cluster, as you have full control over resources.

My VNet is already configured and I created a separate Subnet for the cluster. Containers should be placed in a separate IP range, so I decided to use 172.16.0.0 which is not used in my virtual network.

Before clicking Create button review all settings on the last tab.

You don’t have to wait until the service is provisioned and you can continue with next steps. When the cluster is available you will see three VMs running:

CREATE CONTAINER REGISTRY

During the installation, all containers images will be downloaded from the SAP repository to your private container registry. This way you have a single and secure place to manage them and it also ensures there is no added latency during deployment. For a test or demo landscapes the Basic SKU is just fine, but if you require a higher bandwidth or geo-replication features consider using Standard or Premium SKUs.

PREPARE INSTALLATION HOST

The SAP Data Hub installation cannot be executed directly from the cluster, but instead, it needs an installation host. You can use any already deployed server, but my recommendation is to provision a small VM and use it to manage the cluster. I selected a 2 vCPU server with Ubuntu, which doesn’t cost too much and gives me flexibility.

Once the VM accessible you can connect to it and start the initial configuration and software download.

Install Azure CLI

The easiest way to install Azure CLI on Ubuntu is to follow official Microsoft documentation and execute bash commands:

sudo apt-get install apt-transport-https lsb-release software-properties-common dirmngr -y
AZ_REPO=$(lsb_release -cs)
echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | \
    sudo tee /etc/apt/sources.list.d/azure-cli.list
sudo apt-key --keyring /etc/apt/trusted.gpg.d/Microsoft.gpg adv \
     --keyserver packages.microsoft.com \
     --recv-keys BC528686B50D79E339D3721CEB3E94ADBE1229CF
sudo apt-get update
sudo apt-get install azure-cli

Install kubectl

As before I follow an official documentation to install the kubectl tool, however I slightly modified the script to use the specific version (same as the Kubernetes server):

sudo apt-get update && sudo apt-get install -y apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl=1.11.5-00

Install Docker

The Docker is available as a standard Ubuntu package so it’s just enough to run the apt-get command, no need to update repositories.

apt-get install docker.io

Just to check everything went well I check the installation using docker login command:

docker login

Install PythonYAML

Again, no need to rebuild packages, apt-get is enough.

apt-get install python-yaml

Install helm

The helm doesn’t have a package in Ubuntu repositories, but you can just download the desired version from GitHub and unpack it on the server. Pay attention to the release you want to download, as only some of them are supported.

wget https://storage.googleapis.com/kubernetes-helm/helm-v2.11.0-linux-amd64.tar.gz --content-disposition
tar -xf helm-v2.11.0-linux-amd64.tar.gz
mv helm tiller /usr/local/bin

Download SAP DataHub software

Download the required software from SAP to the installation host. In this tutorial, I won’t use the maintenance planner so you don’t have to install the SAP Host Agent separately. The SL PLUGIN component that is used to perform the installation is bundled with the Data Hub.

The software may not fit to the OS partition, so I added a 128GB disk in Azure and mounted it as /datahub.

I use a wget to get the software from SAP site and then I unpack the archive.

PREPARE KUBERNETES CLUSTER FOR INSTALLATION

When the installation host is ready, we can attempt to execute activities required before the actual SAP Data Hub installation. Firstly, we need to ensure the Kubernetes cluster is available and we can communicate with it. We need to log in to Azure and create a credential file for the kubectl tool.

az login
az aks  get-credentials --resource-group <resource group> --name <cluster name>

You can verify the connection by displaying active services:

kubectl get services

Helm and tiller are two applications that manage the installation of Kubernetes applications in the cluster and are required by SAP Data Hub. The helm is a client located on the installation host and the tiller will be deployed to the cluster. My cluster is RBAC enabled so I start with creating a service account and binding it to the tiller. Create an helm-rbac.yaml file with following content:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

(source: Microsoft)

Then apply the settings using:

kubectl apply -f <filename>

Now initialize the helm using the created service account:

helm init --service-account tiller

To verify that everything went well you can execute the helm ls command and check it doesn’t return any error message. You can also display the deployed pods in the kube-system namespace:

helm ls
kubectl get pods -n kube-system

The last thing we need to perform before starting the installation is to configure access to the container registry. During the deployment of the cluster the service principal is created automatically, and we need to assign permissions to pull images from the registry. It can be done through the Azure Portal, but Microsoft published a script that can do it for us.

ACR_NAME – Azure Container Registry name defined during the deployment

SERVICE_PRINCIPAL_ID – a service that runs the cluster. The ID can be retrieved from the Azure portal in the Azure Active Directory -> App Registrations. Use the Application ID value.

Create and execute the following script filled with your data.

#!/bin/bash
# Modify for your environment. The ACR_NAME is the name of your Azure Container
# Registry, and the SERVICE_PRINCIPAL_ID is the service principal's 'appId' or
# one of its 'servicePrincipalNames' values.

ACR_NAME=mycontainerregistry
SERVICE_PRINCIPAL_ID=<service-principal-ID>

# Populate value required for subsequent command args

ACR_REGISTRY_ID=$(az acr show --name $ACR_NAME --query id --output tsv)

# Assign the desired role to the service principal. Modify the '--role' argument
# value as desired:
# reader:      pull only
# contributor: push and pull
# owner:       push, pull, and assign roles

az role assignment create --assignee $SERVICE_PRINCIPAL_ID --scope $ACR_REGISTRY_ID --role contributor

(source: Microsoft)

Finally, let’s login to the ACR using our local account (if you won’t perform this step installer won’t be able to push images to the registry).

az acr login --name <acrName>

INSTALLATION PROCESS

There are three different ways to perform the installation. The basic option is to use the install script delivered together with SAP Data Hub software, but the recommended approach uses the Software Lifecycle Plugin that performs the installation of containerized application. The SL Plugin can be used with or without Maintenance Planner – I found the integration with Maintenance Planner a bit overkill, so this guide I follow the standalone route using command line.

(source: SAP Data Hub Installation Guide)

If you struggle to decide which approach to take, you can also refer to the pros and cons table inside the SAP Data Hub Installation Guide. In my opinion, using the SL Plugin without Maintenance Planner is currently the best choice, as you don’t have to prepare the SAP Host Agent and the connectivity to the Maintenance Planner.

(source: SAP Data Hub Installation Guide)

We can distinguish several phases of the installation and I will try to shortly describe each of them. At the end of this chapter I also included the full parameter summary for your reference.

Go to the directory where the SAP Data Hub archive was extracted to execute the slpluigin file located under slplugin/bin.

./slplugin execute -p <data hub directory>

The first part of the process verifies that all components have the required version. If your Kubernetes Cluster release is not supported, then the check will fail and you won’t be able to continue with the installation. Always check the currently supported releases in the SAP Data Hub documentation. The path to the kubectl config is determined automatically so we just have to confirm the value.

In the next phase we need to choose a namespace for SAP Data Hub product in the Kubernetes Cluster and decide to go with Basic or Advanced installation mode. The Kubernetes namespace allows to logically separate pods in the cluster – we will use this name again later, so take a note of what you chose.

Then we need to provide our S-User and the password that the installer can download required docker images. They are located in SAP repository which can be accessed using a Technical User created during this process.

If the Docker is not installed on the installation host you will get an error at this stage.

Next, provide the address of the Azure Container Registry. I did not create a pull secret, so I select a respective setting.

We are asked to provide the username and password to access the SAP Data Hub (don’t forget what you entered here as we will use it to log in to the system). I set the tenant name with the default value.

There is a few more questions that we are asked to answer. Each of them is documented, so I won’t repeat information here. The full installation summary can be found bellow. In most cases I followed the default settings.

*
* Parameter Summary
*
Choose 'Next' to start with the values shown. Otherwise choose 'Back' to revise the parameters.

n/a
Path to the KUBECONFIG file: /root/.kube/config

License Agreement
   I authorize: Y

Kubernetes Namespace
Kubernetes Namespace: datahub

Installation Type
     1. Basic Installation
   > 2. Advanced Installation

Container Repository Username
Username: 0001138455-bjarkowski

Container Registry
Container Registry: sapdatahub.azurecr.io

Image Pull Secret
   > 1. Do not use an image pull secret
     2. Use an image pull secret

Certificate Domain
Certificate Domain: datahub

SAP Data Hub System Tenant Administrator Password

SAP Data Hub Initial Tenant Name
Tenant Name: default

SAP Data Hub Initial Tenant Administrator Username
Username: bjarkowski

SAP Data Hub Initial Tenant Administrator Password Configuration
   > 1. Use the same password
     2. Do not use the same password

Cluster Proxy Settings
     1. Configure
   > 2. Do not configure

Checkpoint Store Configuration
   > 1. Do not enable checkpoint store
     2. Enable checkpoint store

RBAC Settings
   > 1. It is enabled
     2. It is not enabled

SAP Data Hub Diagnostics Persistency Configuration
   > 1. Do not enable persistency
     2. Enable persistency

Storage Class Configuration
   > 1. Do not configure storage classes
     2. Configure storage classes

Docker Container Log Path Configuration
   > 1. Do not configure container log path
     2. Configure container log path

Container Registry Settings for Pipeline Modeler
     1. Use a different one
   > 2. Use the same

Loading NFS Modules
   > 1. Enable loading NFS modules
     2. Disable loading NFS modules

Helm Timeout
Timeout in seconds: 1200

Pod Wait Timeout
Timeout in seconds: 300

Additional Installation Parameters
Additional Installation Parameters:
Choose action Back/Next [b/N/F1]: N

Now the installation starts. During the first part all docker images are pulled from the SAP repositories and pushed to Azure Container Registry. This can take some time (around 2 hours, depending on the internet bandwidth), so it’s a good moment to make a coffee in my favourite SAP Community cup!

As the process goes you can see more and more repositories in the Azure Container Registry:

When all images are pushed to the container registry the installer starts to deploy them to Kubernetes cluster. This is again quite a long process, so it’s a chance for a second coffee 🙂

You should not encounter any issues, but it’s worth to remember a few comments that can help investigate the problem:

Display all pods deployed to the namespace in the cluster:

kubectl get pods -n <namespace>  

Display detailed information about the deployment status (worth checking when the pod is in status Error or CrashLoopBackOff)

kubectl describe pod <pod_name> -n namespace

Display logs for a pod (worth checking if the pod is in status Error or CrashLoopBackOff)

kubectl log <pod_name> -n namespace

If your node size is too small, you may encounter issues with the deployment. For example, the HANA image requires at least 20 GB of available memory, so if your node has only 16 GB the deployment will fail with the status CrashLoopBackOff.

Some of the containers depend on each other and the Kubernetes cluster will repeat the process of initializing them. CrashLoopBackOff doesn’t always mean that something is wrong, sometimes the pod just needs additional time to be provisioned (because the dependant pod is still in status Init).

The last step of installation is the automatic validation. If everything went well, you should not see any problems the SAP Data Hub is almost ready to use.

EXPOSE SAP DATA HUB TO THE INTERNET

The SAP Data Hub is installed, but currently we can’t communicate with it as the local PC is not part of the VNet. Therefore in the last step I will expose the application to the internet to access it from anywhere. We will use an ingress controller which is basically a reverse proxy for Kubernetes. Install it using the following command:

helm install stable/nginx-ingress --namespace kube-system

The above command also created an External Load Balancer in Azure, configured the backend pool and routing rules, and assigned a public IP address.

You can display the IP address either in the portal or using the following command:

public_ip=$( kubectl -n kube-system get service -l app=nginx-ingress -l component=controller -o jsonpath="{.items[0].status.loadBalancer.ingress[0].ip}");  echo ${public_ip}

There are two way to assign the DNS name to IP address. You can either use the script from the installation guide or go to the Azure Portal and change the setting in there.

Let’s create a self-signed SSL certificate and deploy it to the cluster (change the <FQDN> to the full domain name defined above, for example datahub.westeurope.cloudapp.azure.com and the <namespace> to the namespace chosen during the SAP Data Hub installation):

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=<FQDN>"
kubectl -n <namespace> create secret tls vsystem-tls-certs --key /tmp/tls.key --cert /tmp/tls.crt

(source: SAP Data Hub Installation Guide)

Finally create a configuration script for the ingress and save it on the installation host. I had to slightly modify the script delivered in the Installation Guide by adding extra annotation – without it I couldn’t access the website. Do not replace the <DNS_REPLACE> 🙂

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
 name: vsystem
 annotations:
   kubernetes.io/ingress.class: nginx
   nginx.ingress.kubernetes.io/secure-backends: "true"
   nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
   nginx.ingress.kubernetes.io/proxy-body-size: "500m"
   nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
   nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
   nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
   nginx.ingress.kubernetes.io/proxy-buffer-size: "16k"
   nginx.ingress.kubernetes.io/ssl-passthrough: "true"
spec:
 tls:
 - hosts:
   - <DNS_REPLACE>
   secretName: vsystem-tls-certs
 rules:
 - host: <DNS_REPLACE>
   http:
     paths:
     - path: /
       backend:
         serviceName: vsystem
         servicePort: 8797

Execute the following commands to activate the ingress. Replace the <dns_name> and <namespace> according to your previous choices:

export dns_domain=<dns_domain>
cat vsystem-ingress.yaml | sed "s/<DNS_REPLACE>/${dns_domain}/g" | kubectl -n <NAMESPACE> apply -f –

You can now access SAP Data Hub through the browser:

Log in using the credentials defined during the installation.

That’s it! The installation is completed. I found the process a bit tricky, so I hope such end-to-end guide will be useful.

Assigned Tags

      20 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Florian Riebandt
      Florian Riebandt

      Thank you for this great post

      Everything worked fine for me, except for the access to the data hub launchpad. With the specified annotations for the ingress could not establish a TLS connection to the vsystem. The logs of the vsystem pod shows somthing like:

      http: TLS handshake error from … : tls: first record does not look like a TLS handshake.

      But I was able to solve the problem by replacing

      nginx.ingress.kubernetes.io/secure-backends: "true"

      with

      nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"

      Best regards  Florian

       

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski
      Blog Post Author

      Hi Florian,

      I had some troubles with the ingress service so I have slightly modified the script delievered by SAP. Looks like it still causes some troubles. Thanks for your comment, it will definitely help the others!

      Author's profile photo Manish Shah
      Manish Shah

      Hi Bartosz!

       

      Thanks for the great blog as always. All worked for me, except that I faced the same issue as Florian Riebandt who thankfully had described the trick.

       

      Also since I didn't want to expose the SAP Data Hub to internet, I used internal load balancer instead of external.

      So instead of creating the nginx-ingress controller using the default settings like this

      helm install stable/nginx-ingress --namespace kube-system

       

      I used a parameter file internal-ingress.yaml to specify internal load balancer as below

      controller:
        service:
          loadBalancerIP: <Your chosen internal IP>
          annotations:
            service.beta.kubernetes.io/azure-load-balancer-internal: "true"
      

       

      and then created an ingress controller using the file

      helm install stable/nginx-ingress --namespace kube-system -f internal-ingress.yaml

       

      Regards,

      Manish

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski
      Blog Post Author

      Thanks Manish. Soon I will also write a blog post about exposing the DataHub to internal network, but I’m glad you found (and shared!) the solution 🙂

      Author's profile photo Andreas Reiser
      Andreas Reiser

      That all worked very well, thank you very much!

      I only want to use the system on demand. What would be the best way to safely shut down/start up the services and kubernetes?

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski
      Blog Post Author

      Hello Andreas Reiser !

      I'm glad you found my blog useful.

      I've never seen any recommendation about shutting down the SAP Data Hub. In my test landscape I just shut down the underlaying VMs and that works fine. I never had an issue with it.

      Author's profile photo Andreas Reiser
      Andreas Reiser

      Yes, I have done this a few times and it seems to work fine.

      Thanks!

      Author's profile photo Roland Kramer
      Roland Kramer

      Hello @Bartosz Jarkowski

      Thanks for your Blog of how to setup the Azure Kubernetes Service (AKS).

      Is it fine for you to link this into my Blog - https://blogs.sap.com/2019/07/17/datahub-implementation-with-the-slc-bridge/

      I have spended more time into the Installation with the SL Container Bridge and "less" on the Azure Configuration, as this was described already ...

      Best Regards Roland

      Author's profile photo Enrique Quevedo
      Enrique Quevedo

      Hello, thanks for this interesting post.

       

      Does anybody know how to EXPORT DATA FROM AZURE TO SAP BI HANA !!!!!!!!!!!!!!!!!

       

       

      THANKS

      Author's profile photo Rahul Pant
      Rahul Pant

      Just wanted to check if we have any blogs on SAP Data Hub 2.6 - Installation on Azure using SAP CAL. We are trying to install one from SAP CAL but running into Authorization issues despite the Azure user being the administrator himself. thank you

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski
      Blog Post Author

      Hi,

      no, I have not write any blog about provisioning DH using SAP CAL.

      Could you share what issue do you encounter? Please pare the error log.

      Best regards

      Bartosz

      Author's profile photo Rahul Pant
      Rahul Pant

      Hello,

       

      When we try to use Authorization type = 'Standard Authorization" then it shows up the below error. The Azure Subscription is Pay as you go and the User account is that of Subscription owner..so not sure this attached error  ?

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski
      Blog Post Author

      Thanks!

      To be honest I suggest to raise incident with SAP, they should be able to help you. The only other thing I noticed in the FAQ:

      What are the roles required by the Microsoft Azure user which will grant permissions to SAP Cloud Appliance Library with the Extended Authorization for Kubernetes Cluster authorization type?

      The roles required are the following:

      • Global Administrator for the Azure Active Directory.
      • Owner or User Access Administrator for the scope /subscriptions/<your_subscription_ID>.
      Author's profile photo Rahul Pant
      Rahul Pant

      Thanks ! After some research, I know that when we need to select Authorization type = “Extended Authrization for Kubernetes Cluster” for the Hub 2.6.

       

      When I select that and try to Aurthorize I get the attached error. Have a look incase you seen this before.

      Appreciate your inputs.

       

       

      Author's profile photo Sarikar Dharmanna
      Sarikar Dharmanna

      Hello, It was great blog on data hub installation , i have completed all the steps  and installed successfully but not loading my data hub site on browser,  your help is appreciated

       

      No internet

      There is something wrong with the proxy server, or the address is incorrect.

      Try:

      ERR_PROXY_CONNECTION_FAILED

       

      Thanks

      Author's profile photo Sarikar Dharmanna
      Sarikar Dharmanna

      one more error

      03 Service Temporarily Unavailable


      openresty/1.15.8.2 

      Author's profile photo Sarikar Dharmanna
      Sarikar Dharmanna

      Hello,

       

      Appreciated if any help , if i use IP address on browser than getting Default backend returns to 404 errros

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski
      Blog Post Author

      Hi,

      you need to provide much more details. Have you create the ingress controller using the above script? Did you get any errors during deployments? What is the status of the pods in the cluster? How do you connect to the Data Hub (over internet or directly through VNet)?

      Please ensure that the question contains as much information as possible.

      Also if you need an urgent answer please raise a ticket to SAP. While I try to answer question timely I can't guarantee it 🙂

       

      Author's profile photo Sarikar Dharmanna
      Sarikar Dharmanna

      Hi  Bartosz Jarkowski, Sorry delay  in reply actually issues was resolved some modification of ingress controller referring help.sap.com , Its really appreciated  with good blog 🙂 thank you

      Author's profile photo Skugan Venkatesan
      Skugan Venkatesan

      Hi Sarikar,

      Can you let me know what are the modifications that you had done to the ingress controller. I'm facing the same issue while installing DI 3.0

       

      Thanks,

      Skugan