Experiences from installing SAP Data Hub 2.3 (BETA) with KOPS on AWS
Introduction
Having worked with SAP Data Hub in an implementation project for the past 6 months, I have played a lot with the product and installed several instances of Data Hub 1.3 / 1.4 (the current GA release). Through this, I got the chance to participate in the Beta program for version 2.3, the upcoming release. To get started with the product, you’ll need access to a cluster, and where is fun at work, if you don’t get your hands dirty and setup the things yourself?
From past experience, I relied on AWS again. The target landscape is based in Frankfurt, where unfortunately AWS EKS is not available yet. To get a K8S cluster, you can use kops for deployment. This is fine for testing and development purposes, but not an approved setup for productive use. Luckily, I am a developer!
Comparing to the Data Hub 1.3 installation, I feel the installation got much simpler with the Maintenance Planner and SL Plugin options. Also, an automated validation at the end of the installation checks if your cluster is all fine. In case not, logs are put together nicely to ease the support process for you. Cool thing!
But what do you need to get started? And which things should you watch out for…? See below, and feel free to comment and ask.
How to get started
Installation Host
- Get yourself a Linux box. This can be a VM, your machine, or a host that you SSH into. I did not make good experiences with running the installation from a Docker container, or from the Windows Linux Subsystem. Use a separate Linux box.
- Install kubectl and kops (v1.9.2).
- Generate a SSH keypair via ssh-keygen.
- Install docker.
- Install Python 2.7.
AWS-specifics
- Have access to the AWS console
- Install aws cli on your installation host and launch “aws configure”
- Setup the IAM configuration for kops:
aws iam create-group,… attach policies,… - Create an S3 bucket for Kubernetes State Storage
- Create a new VPC
- Create a Docker Registry and Login to it
aws ecr get-login --no-include-email --region eu-central-1
# returns a docker login command. Execute this to authenticate your local docker installation for access to this registry.
Install the K8S Cluster with KOPS
- Create a configuration: I used 1 master and 3 worker nodes, and setup a bastion “jump host”
kops create cluster ${FQDN_OF_YOUR_CLUSTER} --zones <AWS Zone> --authorization=rbac --node-count ${NUMBER_OF_NODE} --node-size ${NODE_SIZE} --kubernetes-version ${K8S_VER} --topology private --networking calico --vpc=${VPC_ID} --bastion --state ${KOPS_STATE_STORE_BUCKET}
kops create secret --name ${FQDN_OF_YOUR_CLUSTER} sshpublickey admin -i ~/.ssh/id_rsa.pub
- Run the installation
kops update cluster --name ${FQDN_OF_YOUR_CLUSTER} --state ${KOPS_STATE_STORE_BUCKET} --yes
kops validate cluster --name ${FQDN_OF_YOUR_CLUSTER}
- Install the K8S dashboard
kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/kubernetes-dashboard/v1.8.3.yaml
=== Create a file "dashboard.yaml" ===
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard
labels:
k8s-app: kubernetes-dashboard
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kube-system
kubectl create -f dashboard.yaml
# get the credentials for your dashboard. User is "admin", the output is your password
kops get secrets kube --type secret -oplaintext
# define an ingress
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/ingress-nginx/v1.6.0.yaml
Install Helm / Tiller
# Create Srevice Account for tiller
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
# Install helm
curl -LO https://storage.googleapis.com/kubernetes-helm/helm-v2.9.1-linux-amd64.tar.gz
tar -zxvf helm-v2.9.1-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm
helm init --service-account tiller --upgrade
helm init && helm list
Install SAP Data Hub
There are 3 main ways to install SAP Data Hub:
(1) Installation with SAP Maintenance Planner
(2) Installation without SAP Maintenance Planner, but with Software Life-cycle Plugin (see SAP Note 2589449.
(3) Manual Installation (Shell script)
As the Maintenance Planner is not available for the Beta phase, I chose option (2) above. This guides you along the installation process in a wizard on the command line.
To start the installation, you to download these packages from SAP Software Center and move them to your installation host:
- SAPCAR
- SL Plugin
- Data Hub 2.3 Foundation
Having unpacked the SLPlugin and Data Hub Foundation with SAPCAR, the installation is kicked off as easy as this:
~/slplugin/bin/slplugin execute -p ~/SAPDataHub-Foundation/
In the wizard, you have the option of Basic and Advanced mode. For a plain development setup, Basic should be sufficient.
One thing to note: When you define a checkpoint store, ensure your URL follows this pattern, the bucket exists, and that the Access Key/Secret entered has access to this bucket. e.g. https://s3-<region>.amazonaws.com/<bucket-name>
Once the installation completed, you will see a message like below. The nice thing: there is a validation running after the installation, that checks your cluster setup. If something is wrong, you will see that in the output and the respective logs are collected in one place, so you don’t have to search here and there to report an incident.
Installation finished successfully
Ok ?
Prepare analytics data in '/home/<user>/work'
Feedback file was written. Please consider sending statistics data back to SAP by opening file '/home/<user>/work/EvalForm.html'
Enabling (public) access to your cluster
To reach your cluster, you have to define an ingress, similar to the ingress for the K8S dashboard. There are some easy steps to get that setup.
# Create a certificate
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=vsystem.ingress.<your-domain>"
kubectl -n $NAMESPACE create secret tls vsystem-tls-certs --key /tmp/tls.key --cert /tmp/tls.crt
### Create a YAML file: ingress.yaml ###
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: vsystem
annotations:
kubernetes.io/ingress.class: "nginx"
kubernetes.io/tls-acme: "true"
ingress.kubernetes.io/force-ssl-redirect: "true"
ingress.kubernetes.io/secure-backends: "true"
nginx.ingress.kubernetes.io/proxy-body-size: 500m
ingress.kubernetes.io/proxy-body-size: 500m
spec:
rules:
-
host: vsystem.ingress.<your-domain>
http:
paths:
-
path: /
backend:
serviceName: vsystem
servicePort: 8797
tls:
- hosts:
- vsystem.ingress.<your-domain>
secretName: vsystem-tls-certs
######
kubectl -n $NAMESPACE create -f ingress.yaml
kubectl -n $NAMESPACE describe ingress vsystem
First access to your new Data Hub instance
- Open your browser
- Hit to https://vsystem.ingress.<your-domain>…
- Logon with tenant “default”, user as per defined in the installation wizard and the matching password.
Conclusion
Done! Getting your installation host setup properly is actually most of the work. The product installation is straight forward. Looking forward to see the solution going into General Availability and getting your feedback from installing your own development cluster.
Great information! Any comments on HANA DB usage? Is this still needed or
runs the new release completely without HANA?
CU,
Michael
With 2.3, an internal HANA DB is managed and run as part of the Data Hub Distributed Runtime.
This means that it is not required to install and operate an additional SAP HANA DB to use the SAP Data Hub features as the Metadata catalog or Data Profiling.
However, the internal SAP HANA DB is not exposed externally and cannot be used to store your own data.
OK, good to know! 🙂 Is there any technical documentation you can also share?
Please wait for General Availability. As for all SAP Products, documentation is available at SAP Help: https://help.sap.com/viewer/p/SAP_DATA_HUB
OK, will do so. Last question.... Any idea about the date for GA?
Hi Michael
It is GA since yesterday.
Release note:
https://launchpad.support.sap.com/#/notes/2621247
SAP Help:
https://help.sap.com/viewer/p/SAP_DATA_HUB
PAM:
https://apps.support.sap.com/sap/support/pam?hash=s%3DData%2520Hub%26o%3Dmost_viewed%257Cdesc%26st%3Dl%26rpp%3D20%26page%3D1%26pvnr%3D73554900100900002861%26pt%3Dg%257Cd
Best regards
Florian