Technical Articles
SAP Data Hub deployment on Microsoft Azure
INTRODUCTION
SAP Data Hub provides an integration layer for data-driven processes across enterprise services to process and orchestrate data in the overall landscape. It offers an open, big-data centric architecture with OpenSource integration, cloud deployments, and third-party interfaces. It also leverages massive distributed processing and serverless computing capabilities.
In this blog, I will describe how to install SAP Data Hub on Microsoft Azure. In this article, I used the following version:
Azure Kubernetes Services (AKS): 1.10.9
SAP Data Hub: 2.3.174
CREATE THE KUBERNETES CLUSTER
1) Create a new resource AKS.
2) Fill all the needed information like the cluster name, the Kubernetes version etc etc.
Pick 4 standard D8s v3 nodes with 8CPU & 32GB memory. This sizing is the minimum requirements for the Data Hub installation (please confer the Data Hub installation guide).
3) Authentication
Enable RBAC and provide an existing Service Principal Name (SPN).
4) Networking
Be sure that the HTTP application routing is disabled and select the Virtual Network (VNET) and the Subnet (supposing that the VNET and the subnet have been created before).
5) Monitoring
Leave the monitoring to ON.
6) Validation
Once, all is validated, click on Download a template for automation.
Then Deploy.
After re-enter the necessary inputs like Resource Name etc etc, you will need your SPN Client ID and Client Secret.
Disable the Http-application routing, change the network plugin to kubenet and increase the number of max pod per node to at least 50.
Finally, the last step is to Purchase.
During the AKS deployment, Azure creates a new separate resource group (under the name MC_<name of the initial resource group>_<name of the AKS cluster>_<location>) with all the resources needed for the Kubernetes cluster.
In this step, we will need to associate the subnet in the routing table.
Select the subnet that you’ve entered during the AKS deployment.
CREATE THE AZURE CONTAINER REGISTRY
The SAP Data Hub installation require a docker registry. Azure provides a service for this named Azure Container Registry (ACR).
Be sure that the admin user is disabled.
STORAGE FOR THE VORA CHECKPOINT STORE
In order to enable SAP Vora Database streaming tables, checkpoint store needs to be enabled. The store is an object storage, you can either choose ADLS or WASB storage.
JUMP HOST SETUP FOR SAP DATA HUB DEPLOYMENT AND INSTALLATION
It is recommended to do the installation of SAP Data Hub from an external jump host. From the jump host, we will run the SAP Data Hub installation. The hardware requirements for the jump host can be:
- OS: Red Hat Enterprise Linux 7.5,
- CPU: 2 cores
- Memory: 8GB
- Diskspace: 100GB (HDD)
You will need to provide a SSH key during the deployment in order to be able to long in using a SSH client tool like Putty.
Be sure to deploy the jump host in the same subnet, as the AKS, so the AKS nodes will be directly reachable.
Once the jumps host is setup, follow the below instructions.
1) Install the AZ command line interface (AZ CLI).
rpm --import https://packages.microsoft.com/keys/microsoft.asc
sh -c 'echo -e "[azure-cli]\nname=Azure CLI\nbaseurl=https://packages.microsoft.com/yumrepos/azure-cli\nenabled=1\ngpgcheck=1\ngpgkey=https://packages.microsoft.com/keys/microsoft.asc" > /etc/yum.repos.d/azure-cli.repo'
yum install azure-cli
az login
2) Install docker
yum install docker
cd /usr/libexec/docker/
cp docker-runc-current /usr/bin/docker-runc
systemctl enable docker.service
systemctl start docker
3) Install the appropriate version of the kubectl
az aks install-cli --client-version 1.10.9
Get the AKS credentials
az aks get-credentials --resource-group YourRessourceGroup--name AKSName
Check the nodes
kubectl get nodes -o wide
4) Enable the Kubernetes dashboard usage
Create a yaml file rbac-dashboard.yaml with the following
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard
labels:
k8s-app: kubernetes-dashboard
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kube-system
And install it with the following command
kubectl create -f rbac-dashboard.yaml
5) Helm and Tiller
Create the namespace (sdh), where the SAP Data Hub will be installed
kubectl create namespace sdh
Create the yaml file helm-sdh.yaml for the service account:
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: sdh
And install it with the following command
kubectl create -f helm-sdh.yaml
Create the cluster role bindings for the service accounts tiller and default in the namespace sdh
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=sdh:tiller
kubectl create clusterrolebinding vora-cluster-rule --clusterrole=cluster-admin --serviceaccount=sdh:default
Download and unpack helm version 2.9.1
wget https://storage.googleapis.com/kubernetes-helm/helm-v2.9.1-linux-amd64.tar.gz
tar -xvf helm-v2.9.1-linux-amd64.tar.gz
cp linux-amd64/helm /usr/bin/
Set the environment variables
export TILLER_NAMESPACE=sdh
export NAMESPACE=sdh
Helm initialization
helm init --service-account=tiller
Check if the tiller pod in the sdh namespace is running
kubectl get pods --namespace sdh | grep tiller
Check the helm readiness
helm ls
6) Login to Azure Container Registry (ACR)
az acr login -n YourACR
Set the environment variable
export DOCKER_REGISTRY=YourACR.azurecr.io
SAP DATA HUB INSTALLATION
The SAP Data Hub used in this article is 2.3.174. Once you downloaded from SAP Marketplace, upload it into the jump host previously created and setup. Unzip the file (must be SAPDataHub-2.3.174-Foundation.zip).
Finally run the installer
./install.sh
All the bold inputs were given during the installation:
Please enter the SAN (Subject Alternative Name) for the certificate, which must match the fully qualified domain name (FQDN) of the Kubernetes node to be accessed externally: yourFQDNFor SDHAcces
Please enter a username: YourUser
Do you want to use same system user password for YourUser user? (yes/no) yes
Do you want to configure security contexts for Hadoop/Kerberized Hadoop? (yes/no) no
Enable Vora checkpoint store? (yes/no) yes
Please provide the following parameters for Vora's checkpoint store
Please enter type of shared storage (s3/adl/wasb/gcs/webhdfs): wasb
Please enter WASB account name: ****************
Please enter WASB account key: ****************
Please enter WASB endpoint suffix (empty for default 'blob.core.windows.net'):
Please enter WASB endpoints protocol (empty for default 'https'):
Please enter connection timeout in seconds (empty for default 180):
Please enter WASB container and directory (in the form my-container/directory): sdh/
Do you want to validate the checkpoint store? (yes/no) yes
After a successful installation, you should get the following:
2018-11-09T18:37:21+0000 [INFO] Validating...
2018-11-09T18:37:21+0000 [INFO] Running validation for vora-cluster...OK!
2018-11-09T18:37:53+0000 [INFO] Running validation for vora-sparkonk8s...OK!
2018-11-09T18:38:51+0000 [INFO] Running validation for vora-vsystem...OK!
2018-11-09T18:38:57+0000 [INFO] Running validation for datahub-app-base-db...OK!
############ Ports for external connectivity ############
# vora-tx-coordinator-ext/tc port: 30852
# vora-tx-coordinator-ext/hana-wire port: 32564
# vora-textanalysis/textanalysis port: 31994
# vsystem/vsystem port: 32299
#########################################################
# You can find the generated X.509 keys/certificates under /mnt/resource/SAPDataHub-2.3.174-Foundation/logs/20181109_183430 for later use!
#########################################################
# Tenant created: "default"
# User: "YourUser"
# User for tx-coordinator: "default\YourUser"
#########################################################
Please note that the ports above are for my installation and SAP Data Hub will assign random ports.
ACCESSING THE SAP DATA HUB APPLICATION
The easiest way to access the SAP Data Hub is to assign an IP from the Azure Portal to one of the AKS node. The node that you assigned the IP and the one specified during the installation should be the same.
Once is done, you should be able to connect to your instance with the following yourFQDNFor SDHAcces:Ports. In my case, as seen aboce, my port is 32299.
ENABLE THE SAP HANA WIRE FOR SAP HANA SMART DATA ACCESS (SDA)
1) SAP Data Hub setup
There’s a hana-wire functionality on the SAP Data Hub side, this functionality allows you to expose Vora tables via a SDA connection from a HANA database.
To expose the service in the network where the Kubernetes cluster runs, create a Kubernetes service of type LoadBalancer.
From the jump host, create a service of type LoadBalancer with the name vora-tx-coordinator-ext
kubectl -n $NAMESPACE expose service vora-tx-coordinator-ext --type LoadBalancer --name=vora-tx-coordinator-ext
Then, patch internal annotation to the load balancer
kubectl -n $NAMESPACE patch service vora-tx-coordinator-ext -p '{"metadata":{"annotations": {"service.beta.kubernetes.io/azure-load-balancer-internal":"true"}}}'
Run the following command to check the service
kubectl -n $NAMESPACE get service vora-tx-coordinator-ext -w
The hana-wire port is usually 3<instance number>15, so for the default SDH installation it’s 30115.
2) HANA SDA setup
In the HANA Studio, under Provisioning → Remote Sources create a new remote source with the VORA (ODBC) adapter. Fill the usual SDA informations.
The following Extra Adapter Properties needs to be added
IGNORETOPOLOGY=0;encrypt=true;sslValidateCertificate=false;sslCryptoProvider=commoncrypto;sslKeyStore=WhereYourPSEFileIsSored;sslTrustStore=norelevant;
3) Virtual table creation
The virtual tables can be created via SQL command line or via the following, go on Provisioning→ Remote Sources → <source> <user>, right-click the table, and choose Add as Virtual Table.
You can now access the SAP Data Hub Vora tables from your HANA studio.
Thanks for reading, hope it was useful.
great blog!
Thanks!
Abdelmajid, great documentation. It helped me a lot with Azure DH setup without having to dig deep into install guide. It was almost error free except few hiccups captured below.
ADSL is not a storage option any more for vora checkpoint store. Only WASB is available. However, on Azure Dashboard, there is nothing called "WASB" so I ended up just creating a storage account with a container under BLOB service assuming that is "WASB" based on the Azure WASB documentation. I could find all relevant details such as storage account name, endpoint, key etc. Also able to access container from my jump box via Azure CLI. However, DH installation fails when validating vora checkpoint store. For now, I planned to skip validation just to see what happens next and open a BCP message.
Another issue I faced is that during the Pull/Push process which takes some time, installation failed with Login error more then. Everytime install aborted with login error, I ran the Container registry login command (az acr login -n <reg name>) and restarted install.sh. Installation moved on to the next set of push/pull files. I looked for some kind of timeout setting on Azure but could not find it.
Hi Jaspreet,
Thanks for your nice comment.
WASB is the technical name of the Storage Account (BLOB storage). You should not have issues with the storage folder as its mentioned in the blog post.
I also faced a time out during docker image fetch, I have’t investigate further. If I found something relevant, will share it with you. Something that you can try is to enable the Admin user in the ACR. You will be provided a password and you will be asked it once you will connect to the ACR using the az acr login command.
Hope this help
great blog!
Great post!
If we install the SAP Data Hub from SAP CAL, it would automatically deploy all dependent components and related images. Is that approach recommendable instead or the above write up.
Also installation can be done using Software Lifecycle Plugin as well. ( link )
Want to understand the difference, if you can shed some light, would help.
Thanks
Sid