Skip to Content
Technical Articles
Author's profile photo Abdelmajid BEKHTI

SAP Data Hub deployment on Microsoft Azure

INTRODUCTION

 

SAP Data Hub provides an integration layer for data-driven processes across enterprise services to process and orchestrate data in the overall landscape. It offers an open, big-data centric architecture with OpenSource integration, cloud deployments, and third-party interfaces. It also leverages massive distributed processing and serverless computing capabilities.

In this blog, I will describe how to install SAP Data Hub on Microsoft Azure. In this article, I used the following version:

Azure Kubernetes Services (AKS): 1.10.9
SAP Data Hub: 2.3.174

 

CREATE THE KUBERNETES CLUSTER

 

1) Create a new resource AKS.

 

2) Fill all the needed information like the cluster name, the Kubernetes version etc etc.

Pick 4 standard D8s v3 nodes with 8CPU & 32GB memory. This sizing is the minimum requirements for the Data Hub installation (please confer the Data Hub installation guide).

 

3) Authentication

Enable RBAC and provide an existing Service Principal Name (SPN).

 

4) Networking

Be sure that the HTTP application routing is disabled and select the Virtual Network (VNET) and the Subnet (supposing that the VNET and the subnet have been created before).

 

5) Monitoring

Leave the monitoring to ON.

 

6) Validation

Once, all is validated, click on Download a template for automation.

 

Then Deploy.

 

 

After re-enter the necessary inputs like Resource Name etc etc, you will need your SPN Client ID and Client Secret.

Disable the Http-application routing, change the network plugin to kubenet and increase the number of max pod per node to at least 50.

Finally, the last step is to Purchase.

 

During the AKS deployment, Azure creates a new separate resource group (under the name MC_<name of the initial resource group>_<name of the AKS cluster>_<location>) with all the resources needed for the Kubernetes cluster.

In this step, we will need to associate the subnet in the routing table.

 

Select the subnet that you’ve entered during the AKS deployment.

 

 

CREATE THE AZURE CONTAINER REGISTRY

 

The SAP Data Hub installation require a docker registry. Azure provides a service for this named Azure Container Registry (ACR).

 

Be sure that the admin user is disabled.

 

STORAGE FOR THE VORA CHECKPOINT STORE

 

In order to enable SAP Vora Database streaming tables, checkpoint store needs to be enabled. The store is an object storage, you can either choose ADLS or WASB storage.

 

JUMP HOST SETUP FOR SAP DATA HUB DEPLOYMENT AND INSTALLATION

 

It is recommended to do the installation of SAP Data Hub from an external jump host. From the jump host, we will run the SAP Data Hub installation. The hardware requirements for the jump host can be:

  • OS: Red Hat Enterprise Linux 7.5,
  • CPU: 2 cores
  • Memory: 8GB
  • Diskspace: 100GB (HDD)

You will need to provide a SSH key during the deployment in order to be able to long in using a SSH client tool like Putty.

Be sure to deploy the jump host in the same subnet, as the AKS, so the AKS nodes will be directly reachable.

Once the jumps host is setup, follow the below instructions.

 

1) Install the AZ command line interface (AZ CLI).

rpm --import https://packages.microsoft.com/keys/microsoft.asc
sh -c 'echo -e "[azure-cli]\nname=Azure CLI\nbaseurl=https://packages.microsoft.com/yumrepos/azure-cli\nenabled=1\ngpgcheck=1\ngpgkey=https://packages.microsoft.com/keys/microsoft.asc" > /etc/yum.repos.d/azure-cli.repo'
yum install azure-cli
az login

 

2) Install docker

yum install docker
cd /usr/libexec/docker/
cp docker-runc-current /usr/bin/docker-runc
systemctl enable docker.service
systemctl start docker

 

3) Install the appropriate version of the kubectl

az aks install-cli --client-version 1.10.9

Get the AKS credentials

az aks get-credentials --resource-group YourRessourceGroup--name AKSName

Check the nodes

kubectl get nodes -o wide

 

4) Enable the Kubernetes dashboard usage

Create a yaml file rbac-dashboard.yaml with the following

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: kubernetes-dashboard
  labels:
    k8s-app: kubernetes-dashboard
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: kubernetes-dashboard
  namespace: kube-system

And install it with the following command

kubectl create -f rbac-dashboard.yaml

 

5) Helm and Tiller

Create the namespace (sdh), where the SAP Data Hub will be installed

kubectl create namespace sdh

Create the yaml file helm-sdh.yaml for the service account:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: sdh

And install it with the following command

kubectl create -f helm-sdh.yaml

Create the cluster role bindings for the service accounts tiller and default in the namespace sdh

kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=sdh:tiller
kubectl create clusterrolebinding vora-cluster-rule --clusterrole=cluster-admin --serviceaccount=sdh:default

Download and unpack helm version 2.9.1

wget https://storage.googleapis.com/kubernetes-helm/helm-v2.9.1-linux-amd64.tar.gz
tar -xvf helm-v2.9.1-linux-amd64.tar.gz
cp linux-amd64/helm /usr/bin/

Set the environment variables

export TILLER_NAMESPACE=sdh
export NAMESPACE=sdh

Helm initialization

helm init --service-account=tiller

Check if the tiller pod in the sdh namespace is running

kubectl get pods --namespace sdh | grep tiller

Check the helm readiness

helm ls

 

6) Login to Azure Container Registry (ACR)

az acr login -n YourACR

Set the environment variable

export DOCKER_REGISTRY=YourACR.azurecr.io

 

SAP DATA HUB INSTALLATION

 

The SAP Data Hub used in this article is 2.3.174. Once you downloaded from SAP Marketplace, upload it into the jump host previously created and setup. Unzip the file (must be SAPDataHub-2.3.174-Foundation.zip).

Finally run the installer

./install.sh

All the bold inputs were given during the installation:

Please enter the SAN (Subject Alternative Name) for the certificate, which must match the fully qualified domain name (FQDN) of the Kubernetes node to be accessed externally: yourFQDNFor SDHAcces
Please enter a username: YourUser
Do you want to use same system user password for YourUser user? (yes/no) yes
Do you want to configure security contexts for Hadoop/Kerberized Hadoop? (yes/no) no
Enable Vora checkpoint store? (yes/no) yes
Please provide the following parameters for Vora's checkpoint store
Please enter type of shared storage (s3/adl/wasb/gcs/webhdfs): wasb
Please enter WASB account name: ****************
Please enter WASB account key: ****************
Please enter WASB endpoint suffix (empty for default 'blob.core.windows.net'):
Please enter WASB endpoints protocol (empty for default 'https'):
Please enter connection timeout in seconds (empty for default 180):
Please enter WASB container and directory (in the form my-container/directory): sdh/
Do you want to validate the checkpoint store? (yes/no) yes

After a successful installation, you should get the following:

2018-11-09T18:37:21+0000 [INFO] Validating...
2018-11-09T18:37:21+0000 [INFO] Running validation for vora-cluster...OK!
2018-11-09T18:37:53+0000 [INFO] Running validation for vora-sparkonk8s...OK!
2018-11-09T18:38:51+0000 [INFO] Running validation for vora-vsystem...OK!
2018-11-09T18:38:57+0000 [INFO] Running validation for datahub-app-base-db...OK!
############ Ports for external connectivity ############
# vora-tx-coordinator-ext/tc port:                  30852
# vora-tx-coordinator-ext/hana-wire port:           32564
# vora-textanalysis/textanalysis port:              31994
# vsystem/vsystem port:                             32299
#########################################################
# You can find the generated X.509 keys/certificates under /mnt/resource/SAPDataHub-2.3.174-Foundation/logs/20181109_183430 for later use!
#########################################################
# Tenant created: "default"
# User: "YourUser"
# User for tx-coordinator: "default\YourUser"
#########################################################

Please note that the ports above are for my installation and SAP Data Hub will assign random ports.

 

ACCESSING THE SAP DATA HUB APPLICATION

 

The easiest way to access the SAP Data Hub is to assign an IP from the Azure Portal to one of the AKS node. The node that you assigned the IP and the one specified during the installation should be the same.

Once is done, you should be able to connect to your instance with the following yourFQDNFor SDHAcces:Ports. In my case, as seen aboce, my port is 32299.

 

ENABLE THE SAP HANA WIRE FOR SAP HANA SMART DATA ACCESS (SDA)

 

1) SAP Data Hub setup

There’s a hana-wire functionality on the SAP Data Hub side, this functionality allows you to expose Vora tables via a SDA connection from a HANA database.

To expose the service in the network where the Kubernetes cluster runs, create a Kubernetes service of type LoadBalancer.

From the jump host, create a service of type LoadBalancer with the name vora-tx-coordinator-ext

kubectl -n $NAMESPACE expose service vora-tx-coordinator-ext --type LoadBalancer --name=vora-tx-coordinator-ext

Then, patch internal annotation to the load balancer

kubectl -n $NAMESPACE patch service vora-tx-coordinator-ext -p '{"metadata":{"annotations": {"service.beta.kubernetes.io/azure-load-balancer-internal":"true"}}}'

Run the following command to check the service

kubectl -n $NAMESPACE get service vora-tx-coordinator-ext -w

The hana-wire port is usually 3<instance number>15, so for the default SDH installation it’s 30115.

 

2) HANA SDA setup

In the HANA Studio, under Provisioning → Remote Sources create a new remote source with the VORA (ODBC) adapter. Fill the usual SDA informations.

 

The following Extra Adapter Properties needs to be added

IGNORETOPOLOGY=0;encrypt=true;sslValidateCertificate=false;sslCryptoProvider=commoncrypto;sslKeyStore=WhereYourPSEFileIsSored;sslTrustStore=norelevant;

 

3) Virtual table creation

The virtual tables can be created via SQL command line or via the following, go on Provisioning→ Remote Sources → <source>  <user>, right-click the table, and choose Add as Virtual Table.

 

You can now access the SAP Data Hub Vora tables from your HANA studio.

 

Thanks for reading, hope it was useful.

Assigned Tags

      6 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Vinay Bhatt
      Vinay Bhatt

      great blog!

      Author's profile photo Abdelmajid BEKHTI
      Abdelmajid BEKHTI
      Blog Post Author

      Thanks!

      Author's profile photo Jaspreet Singh
      Jaspreet Singh

      Abdelmajid, great documentation. It helped me a lot with Azure DH setup without having to dig deep into install guide. It was almost error free except few hiccups captured below.

      ADSL is not a storage option any more for vora checkpoint store. Only WASB is available. However, on Azure Dashboard, there is nothing called "WASB" so I ended up just creating a storage account with a container under BLOB service assuming that is "WASB" based on the Azure WASB documentation.  I could find all relevant details such as storage account name, endpoint, key etc. Also able to access container from my jump box via Azure CLI. However, DH installation fails when validating vora checkpoint store.  For now, I planned to skip validation just to see what happens next and open a BCP message.

      Another issue I faced is that during the Pull/Push process which takes some time, installation failed with Login error more then. Everytime install aborted with login error, I ran the Container registry login command (az acr login -n <reg name>) and restarted install.sh. Installation moved on to the next set of push/pull files.  I looked for some kind of timeout setting on Azure but could not find it.

       

      Author's profile photo Abdelmajid BEKHTI
      Abdelmajid BEKHTI
      Blog Post Author

      Hi Jaspreet,

      Thanks for your nice comment.

      WASB is the technical name of the Storage Account (BLOB storage). You should not have issues with the storage folder as its mentioned in the blog post.

      I also faced a time out during docker image fetch, I have’t investigate further. If I found something relevant, will share it with you. Something that you can try is to enable the Admin user in the ACR. You will be provided a password and you will be asked it once you will connect to the ACR using the az acr login command.

      Hope this help

       

       

      Author's profile photo Ann Lily
      Ann Lily

      great blog!

      Author's profile photo Sid Krishna
      Sid Krishna

      Great post!

      If we install the SAP Data Hub from SAP CAL, it would automatically deploy all dependent components and related images. Is that approach recommendable instead or the above write up.

      Also installation can be done using  Software Lifecycle Plugin as well. ( link )

      Want to understand the difference, if you can shed some light, would help.

       

      Thanks

      Sid