Skip to Content
Technical Articles

Your SAP on Azure – Part 13 – Install SAP Data Hub on Azure Kubernetes Service

SAP Data Hub is one of the newest SAP products that integrates data from multiple sources. It’s fully containerized which means that each application component runs in a separate environment. The architecture is completely different to SAP Netweaver, so there is a lot of learning ahead of us. If you are new to Docker and Kubernetes I highly recommend to read firstly an excellent introduction to SAP Data Hub containers written by Thorsten Schneider!

To run the SAP Data Hub you will require a Kubernetes service to manage the deployed containers. You can use an on-premise solution like SUSE CaaS, but I recommend having a look at the Azure Kubernetes Service and deploy the SAP Data Hub in the cloud!

In the following blog, I will present how to prepare all required Azure resources and install the SAP Data Hub product using Azure Container Service.

The guide is based on the SAP Data Hub 2.4. SAP is intensively working making things easier, so in future, some steps may be different or not required at all. Always check the current requirements in the installation guide!

PREPARE AZURE KUBERNETES SERVICE

I start with provisioning of the Kubernetes service, as it usually takes a while to be ready. Open the Azure portal and create a new Kubernetes Cluster. Following the installation guide, I select a VM size with 4 CPUs and 32 GB of memory. If you select a smaller node most likely you will encounter issues during containers deployment.

Be careful with choosing the Kubernetes version, as only some of them are supported by SAP Data Hub. Always check the documentation. The currently supported version includes 11.5 which I chose on the below screen.

On the Authentication tab I enabled the Role-Based Access Control which is a more secure way to run the Kubernetes cluster, as you have full control over resources.

My VNet is already configured and I created a separate Subnet for the cluster. Containers should be placed in a separate IP range, so I decided to use 172.16.0.0 which is not used in my virtual network.

Before clicking Create button review all settings on the last tab.

You don’t have to wait until the service is provisioned and you can continue with next steps. When the cluster is available you will see three VMs running:

CREATE CONTAINER REGISTRY

During the installation, all containers images will be downloaded from the SAP repository to your private container registry. This way you have a single and secure place to manage them and it also ensures there is no added latency during deployment. For a test or demo landscapes the Basic SKU is just fine, but if you require a higher bandwidth or geo-replication features consider using Standard or Premium SKUs.

PREPARE INSTALLATION HOST

The SAP Data Hub installation cannot be executed directly from the cluster, but instead, it needs an installation host. You can use any already deployed server, but my recommendation is to provision a small VM and use it to manage the cluster. I selected a 2 vCPU server with Ubuntu, which doesn’t cost too much and gives me flexibility.

Once the VM accessible you can connect to it and start the initial configuration and software download.

Install Azure CLI

The easiest way to install Azure CLI on Ubuntu is to follow official Microsoft documentation and execute bash commands:

sudo apt-get install apt-transport-https lsb-release software-properties-common dirmngr -y
AZ_REPO=$(lsb_release -cs)
echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | \
    sudo tee /etc/apt/sources.list.d/azure-cli.list
sudo apt-key --keyring /etc/apt/trusted.gpg.d/Microsoft.gpg adv \
     --keyserver packages.microsoft.com \
     --recv-keys BC528686B50D79E339D3721CEB3E94ADBE1229CF
sudo apt-get update
sudo apt-get install azure-cli

Install kubectl

As before I follow an official documentation to install the kubectl tool, however I slightly modified the script to use the specific version (same as the Kubernetes server):

sudo apt-get update && sudo apt-get install -y apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl=1.11.5-00

Install Docker

The Docker is available as a standard Ubuntu package so it’s just enough to run the apt-get command, no need to update repositories.

apt-get install docker.io

Just to check everything went well I check the installation using docker login command:

docker login

Install PythonYAML

Again, no need to rebuild packages, apt-get is enough.

apt-get install python-yaml

Install helm

The helm doesn’t have a package in Ubuntu repositories, but you can just download the desired version from GitHub and unpack it on the server. Pay attention to the release you want to download, as only some of them are supported.

wget https://storage.googleapis.com/kubernetes-helm/helm-v2.11.0-linux-amd64.tar.gz --content-disposition
tar -xf helm-v2.11.0-linux-amd64.tar.gz
mv helm tiller /usr/local/bin

Download SAP DataHub software

Download the required software from SAP to the installation host. In this tutorial, I won’t use the maintenance planner so you don’t have to install the SAP Host Agent separately. The SL PLUGIN component that is used to perform the installation is bundled with the Data Hub.

The software may not fit to the OS partition, so I added a 128GB disk in Azure and mounted it as /datahub.

I use a wget to get the software from SAP site and then I unpack the archive.

PREPARE KUBERNETES CLUSTER FOR INSTALLATION

When the installation host is ready, we can attempt to execute activities required before the actual SAP Data Hub installation. Firstly, we need to ensure the Kubernetes cluster is available and we can communicate with it. We need to log in to Azure and create a credential file for the kubectl tool.

az login
az aks  get-credentials --resource-group <resource group> --name <cluster name>

You can verify the connection by displaying active services:

kubectl get services

Helm and tiller are two applications that manage the installation of Kubernetes applications in the cluster and are required by SAP Data Hub. The helm is a client located on the installation host and the tiller will be deployed to the cluster. My cluster is RBAC enabled so I start with creating a service account and binding it to the tiller. Create an helm-rbac.yaml file with following content:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

(source: Microsoft)

Then apply the settings using:

helm init --service-account tiller

Now initialize the helm using the created service account:

helm init --service-account tiller

To verify that everything went well you can execute the helm ls command and check it doesn’t return any error message. You can also display the deployed pods in the kube-system namespace:

helm ls
kubectl get pods -n kube-system

The last thing we need to perform before starting the installation is to configure access to the container registry. During the deployment of the cluster the service principal is created automatically, and we need to assign permissions to pull images from the registry. It can be done through the Azure Portal, but Microsoft published a script that can do it for us.

ACR_NAME – Azure Container Registry name defined during the deployment

SERVICE_PRINCIPAL_ID – a service that runs the cluster. The ID can be retrieved from the Azure portal in the Azure Active Directory -> App Registrations. Use the Application ID value.

Create and execute the following script filled with your data.

#!/bin/bash
# Modify for your environment. The ACR_NAME is the name of your Azure Container
# Registry, and the SERVICE_PRINCIPAL_ID is the service principal's 'appId' or
# one of its 'servicePrincipalNames' values.

ACR_NAME=mycontainerregistry
SERVICE_PRINCIPAL_ID=<service-principal-ID>

# Populate value required for subsequent command args

ACR_REGISTRY_ID=$(az acr show --name $ACR_NAME --query id --output tsv)

# Assign the desired role to the service principal. Modify the '--role' argument
# value as desired:
# reader:      pull only
# contributor: push and pull
# owner:       push, pull, and assign roles

az role assignment create --assignee $SERVICE_PRINCIPAL_ID --scope $ACR_REGISTRY_ID --role contributor

(source: Microsoft)

Finally, let’s login to the ACR using our local account (if you won’t perform this step installer won’t be able to push images to the registry).

az acr login --name <acrName>

INSTALLATION PROCESS

There are three different ways to perform the installation. The basic option is to use the install script delivered together with SAP Data Hub software, but the recommended approach uses the Software Lifecycle Plugin that performs the installation of containerized application. The SL Plugin can be used with or without Maintenance Planner – I found the integration with Maintenance Planner a bit overkill, so this guide I follow the standalone route using command line.

(source: SAP Data Hub Installation Guide)

If you struggle to decide which approach to take, you can also refer to the pros and cons table inside the SAP Data Hub Installation Guide. In my opinion, using the SL Plugin without Maintenance Planner is currently the best choice, as you don’t have to prepare the SAP Host Agent and the connectivity to the Maintenance Planner.

(source: SAP Data Hub Installation Guide)

We can distinguish several phases of the installation and I will try to shortly describe each of them. At the end of this chapter I also included the full parameter summary for your reference.

Go to the directory where the SAP Data Hub archive was extracted to execute the slpluigin file located under slplugin/bin.

./slplugin execute -p <data hub directory>

The first part of the process verifies that all components have the required version. If your Kubernetes Cluster release is not supported, then the check will fail and you won’t be able to continue with the installation. Always check the currently supported releases in the SAP Data Hub documentation. The path to the kubectl config is determined automatically so we just have to confirm the value.

In the next phase we need to choose a namespace for SAP Data Hub product in the Kubernetes Cluster and decide to go with Basic or Advanced installation mode. The Kubernetes namespace allows to logically separate pods in the cluster – we will use this name again later, so take a note of what you chose.

Then we need to provide our S-User and the password that the installer can download required docker images. They are located in SAP repository which can be accessed using a Technical User created during this process.

If the Docker is not installed on the installation host you will get an error at this stage.

Next, provide the address of the Azure Container Registry. I did not create a pull secret, so I select a respective setting.

We are asked to provide the username and password to access the SAP Data Hub (don’t forget what you entered here as we will use it to log in to the system). I set the tenant name with the default value.

There is a few more questions that we are asked to answer. Each of them is documented, so I won’t repeat information here. The full installation summary can be found bellow. In most cases I followed the default settings.

*
* Parameter Summary
*
Choose 'Next' to start with the values shown. Otherwise choose 'Back' to revise the parameters.

n/a
Path to the KUBECONFIG file: /root/.kube/config

License Agreement
   I authorize: Y

Kubernetes Namespace
Kubernetes Namespace: datahub

Installation Type
     1. Basic Installation
   > 2. Advanced Installation

Container Repository Username
Username: 0001138455-bjarkowski

Container Registry
Container Registry: sapdatahub.azurecr.io

Image Pull Secret
   > 1. Do not use an image pull secret
     2. Use an image pull secret

Certificate Domain
Certificate Domain: datahub

SAP Data Hub System Tenant Administrator Password

SAP Data Hub Initial Tenant Name
Tenant Name: default

SAP Data Hub Initial Tenant Administrator Username
Username: bjarkowski

SAP Data Hub Initial Tenant Administrator Password Configuration
   > 1. Use the same password
     2. Do not use the same password

Cluster Proxy Settings
     1. Configure
   > 2. Do not configure

Checkpoint Store Configuration
   > 1. Do not enable checkpoint store
     2. Enable checkpoint store

RBAC Settings
   > 1. It is enabled
     2. It is not enabled

SAP Data Hub Diagnostics Persistency Configuration
   > 1. Do not enable persistency
     2. Enable persistency

Storage Class Configuration
   > 1. Do not configure storage classes
     2. Configure storage classes

Docker Container Log Path Configuration
   > 1. Do not configure container log path
     2. Configure container log path

Container Registry Settings for Pipeline Modeler
     1. Use a different one
   > 2. Use the same

Loading NFS Modules
   > 1. Enable loading NFS modules
     2. Disable loading NFS modules

Helm Timeout
Timeout in seconds: 1200

Pod Wait Timeout
Timeout in seconds: 300

Additional Installation Parameters
Additional Installation Parameters:
Choose action Back/Next [b/N/F1]: N

Now the installation starts. During the first part all docker images are pulled from the SAP repositories and pushed to Azure Container Registry. This can take some time (around 2 hours, depending on the internet bandwidth), so it’s a good moment to make a coffee in my favourite SAP Community cup!

As the process goes you can see more and more repositories in the Azure Container Registry:

When all images are pushed to the container registry the installer starts to deploy them to Kubernetes cluster. This is again quite a long process, so it’s a chance for a second coffee 🙂

You should not encounter any issues, but it’s worth to remember a few comments that can help investigate the problem:

Display all pods deployed to the namespace in the cluster:

kubectl get pods -n <namespace>  

Display detailed information about the deployment status (worth checking when the pod is in status Error or CrashLoopBackOff)

kubectl describe pod <pod_name> -n namespace

Display logs for a pod (worth checking if the pod is in status Error or CrashLoopBackOff)

kubectl log <pod_name> -n namespace

If your node size is too small, you may encounter issues with the deployment. For example, the HANA image requires at least 20 GB of available memory, so if your node has only 16 GB the deployment will fail with the status CrashLoopBackOff.

Some of the containers depend on each other and the Kubernetes cluster will repeat the process of initializing them. CrashLoopBackOff doesn’t always mean that something is wrong, sometimes the pod just needs additional time to be provisioned (because the dependant pod is still in status Init).

The last step of installation is the automatic validation. If everything went well, you should not see any problems the SAP Data Hub is almost ready to use.

EXPOSE SAP DATA HUB TO THE INTERNET

The SAP Data Hub is installed, but currently we can’t communicate with it as the local PC is not part of the VNet. Therefore in the last step I will expose the application to the internet to access it from anywhere. We will use an ingress controller which is basically a reverse proxy for Kubernetes. Install it using the following command:

helm install stable/nginx-ingress --namespace kube-system

The above command also created an External Load Balancer in Azure, configured the backend pool and routing rules, and assigned a public IP address.

You can display the IP address either in the portal or using the following command:

public_ip=$( kubectl -n kube-system get service -l app=nginx-ingress -l component=controller -o jsonpath="{.items[0].status.loadBalancer.ingress[0].ip}");  echo ${public_ip}

There are two way to assign the DNS name to IP address. You can either use the script from the installation guide or go to the Azure Portal and change the setting in there.

Let’s create a self-signed SSL certificate and deploy it to the cluster (change the <FQDN> to the full domain name defined above, for example datahub.westeurope.cloudapp.azure.com and the <namespace> to the namespace chosen during the SAP Data Hub installation):

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=<FQDN>"
kubectl -n <namespace> create secret tls vsystem-tls-certs --key /tmp/tls.key --cert /tmp/tls.crt

(source: SAP Data Hub Installation Guide)

Finally create a configuration script for the ingress and save it on the installation host. I had to slightly modify the script delivered in the Installation Guide by adding extra annotation – without it I couldn’t access the website. Do not replace the <DNS_REPLACE> 🙂

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
 name: vsystem
 annotations:
   kubernetes.io/ingress.class: nginx
   nginx.ingress.kubernetes.io/secure-backends: "true"
   nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
   nginx.ingress.kubernetes.io/proxy-body-size: "500m"
   nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
   nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
   nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
   nginx.ingress.kubernetes.io/proxy-buffer-size: "16k"
   nginx.ingress.kubernetes.io/ssl-passthrough: "true"
spec:
 tls:
 - hosts:
   - <DNS_REPLACE>
   secretName: vsystem-tls-certs
 rules:
 - host: <DNS_REPLACE>
   http:
     paths:
     - path: /
       backend:
         serviceName: vsystem
         servicePort: 8797

Execute the following commands to activate the ingress. Replace the <dns_name> and <namespace> according to your previous choices:

export dns_domain=<dns_domain>
cat vsystem-ingress.yaml | sed "s/<DNS_REPLACE>/${dns_domain}/g" | kubectl -n <NAMESPACE> apply -f –

You can now access SAP Data Hub through the browser:

Log in using the credentials defined during the installation.

That’s it! The installation is completed. I found the process a bit tricky, so I hope such end-to-end guide will be useful.

Be the first to leave a comment
You must be Logged on to comment or reply to a post.