Skip to Content
Author's profile photo Florian Weigold

Experiences from installing SAP Data Hub 2.3 (BETA) with KOPS on AWS

Introduction

Having worked with SAP Data Hub in an implementation project for the past 6 months, I have played a lot with the product and installed several instances of Data Hub 1.3 / 1.4 (the current GA release). Through this, I got the chance to participate in the Beta program for version 2.3, the upcoming release. To get started with the product, you’ll need access to a cluster, and where is fun at work, if you don’t get your hands dirty and setup the things yourself?

From past experience, I relied on AWS again. The target landscape is based in Frankfurt, where unfortunately AWS EKS is not available yet. To get a K8S cluster, you can use kops for deployment. This is fine for testing and development purposes, but not an approved setup for productive use. Luckily, I am a developer!

Comparing to the Data Hub 1.3 installation, I feel the installation got much simpler with the Maintenance Planner and SL Plugin options. Also, an automated validation at the end of the installation checks if your cluster is all fine. In case not, logs are put together nicely to ease the support process for you. Cool thing!

But what do you need to get started? And which things should you watch out for…? See below, and feel free to comment and ask.

How to get started

Installation Host

  • Get yourself a Linux box. This can be a VM, your machine, or a host that you SSH into. I did not make good experiences with running the installation from a Docker container, or from the Windows Linux Subsystem. Use a separate Linux box.
  • Install kubectl and kops (v1.9.2).
  • Generate a SSH keypair via ssh-keygen.
  • Install docker.
  • Install Python 2.7.

AWS-specifics

  • Have access to the AWS console
  • Install aws cli on your installation host and launch “aws configure”
  • Setup the IAM configuration for kops:
    aws iam create-group,… attach policies,…
  • Create an S3 bucket for Kubernetes State Storage
  • Create a new VPC
  • Create a Docker Registry and Login to it
aws ecr get-login --no-include-email --region eu-central-1
# returns a docker login command. Execute this to authenticate your local docker installation for access to this registry.

Install the K8S Cluster with KOPS

  • Create a configuration: I used 1 master and 3 worker nodes, and setup a bastion “jump host”
kops create cluster ${FQDN_OF_YOUR_CLUSTER} --zones <AWS Zone> --authorization=rbac --node-count ${NUMBER_OF_NODE} --node-size ${NODE_SIZE} --kubernetes-version ${K8S_VER} --topology private --networking calico --vpc=${VPC_ID} --bastion --state ${KOPS_STATE_STORE_BUCKET}
kops create secret --name ${FQDN_OF_YOUR_CLUSTER} sshpublickey admin -i ~/.ssh/id_rsa.pub
  • Run the installation
kops update cluster --name ${FQDN_OF_YOUR_CLUSTER} --state ${KOPS_STATE_STORE_BUCKET} --yes
kops validate cluster --name ${FQDN_OF_YOUR_CLUSTER}
  • Install the K8S dashboard
kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/kubernetes-dashboard/v1.8.3.yaml

=== Create a file "dashboard.yaml" ===
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: kubernetes-dashboard
  labels:
    k8s-app: kubernetes-dashboard
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: kubernetes-dashboard
  namespace: kube-system

kubectl create -f dashboard.yaml

# get the credentials for your dashboard. User is "admin", the output is your password
kops get secrets kube --type secret -oplaintext

# define an ingress
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/ingress-nginx/v1.6.0.yaml

Install Helm / Tiller

# Create Srevice Account for tiller
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller

# Install helm
curl -LO https://storage.googleapis.com/kubernetes-helm/helm-v2.9.1-linux-amd64.tar.gz
tar -zxvf helm-v2.9.1-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm
helm init --service-account tiller --upgrade
helm init && helm list

 

Install SAP Data Hub

There are 3 main ways to install SAP Data Hub:

(1) Installation with SAP Maintenance Planner

(2) Installation without SAP Maintenance Planner, but with Software Life-cycle Plugin (see SAP Note 2589449.

(3) Manual Installation (Shell script)

As the Maintenance Planner is not available for the Beta phase, I chose option (2) above. This guides you along the installation process in a wizard on the command line.

To start the installation, you to download these packages from SAP Software Center and move them to your installation host:

  • SAPCAR
  • SL Plugin
  • Data Hub 2.3 Foundation

Having unpacked the SLPlugin and Data Hub Foundation with SAPCAR, the installation is kicked off as easy as this:

~/slplugin/bin/slplugin execute -p ~/SAPDataHub-Foundation/

In the wizard, you have the option of Basic and Advanced mode. For a plain development setup, Basic should be sufficient.

One thing to note: When you define a checkpoint store, ensure your URL follows this pattern, the bucket exists, and that the Access Key/Secret entered has access to this bucket. e.g. https://s3-<region>.amazonaws.com/<bucket-name>

Once the installation completed, you will see a message like below. The nice thing: there is a validation running after the installation, that checks your cluster setup. If something is wrong, you will see that in the output and the respective logs are collected in one place, so you don’t have to search here and there to report an incident.

Installation finished successfully

Ok ?
Prepare analytics data in '/home/<user>/work'
Feedback file was written. Please consider sending statistics data back to SAP by opening file '/home/<user>/work/EvalForm.html'

Enabling (public) access to your cluster

To reach your cluster, you have to define an ingress, similar to the ingress for the K8S dashboard. There are some easy steps to get that setup.

# Create a certificate
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=vsystem.ingress.<your-domain>"
kubectl -n $NAMESPACE create secret tls vsystem-tls-certs --key /tmp/tls.key --cert /tmp/tls.crt

### Create a YAML file: ingress.yaml ###
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
        name: vsystem
        annotations:
            kubernetes.io/ingress.class: "nginx"
            kubernetes.io/tls-acme: "true"
            ingress.kubernetes.io/force-ssl-redirect: "true"
            ingress.kubernetes.io/secure-backends: "true"
            nginx.ingress.kubernetes.io/proxy-body-size: 500m
            ingress.kubernetes.io/proxy-body-size: 500m
spec:
        rules:
        -
            host: vsystem.ingress.<your-domain>
            http:
                paths:
                -
                    path: /
                    backend:
                        serviceName: vsystem
                        servicePort: 8797
        tls:
            - hosts:
                - vsystem.ingress.<your-domain>
              secretName: vsystem-tls-certs
######
  
kubectl -n $NAMESPACE create -f ingress.yaml
kubectl -n $NAMESPACE describe ingress vsystem

First access to your new Data Hub instance

  • Open your browser
  • Hit to https://vsystem.ingress.<your-domain>…
  • Logon with tenant “default”, user as per defined in the installation wizard and the matching password.

Conclusion

Done! Getting your installation host setup properly is actually most of the work. The product installation is straight forward. Looking forward to see the solution going into General Availability and getting your feedback from installing your own development cluster.

Assigned Tags

      6 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Michael Warwas
      Michael Warwas

      Great information! Any comments on HANA DB usage? Is this still needed or

      runs the new release completely without HANA?

      CU,

      Michael

       

      Author's profile photo Florian Weigold
      Florian Weigold
      Blog Post Author

      With 2.3, an internal HANA DB is managed and run as part of the Data Hub Distributed Runtime.

      This means that it is not required to install and operate an additional SAP HANA DB to use the SAP Data Hub features as the Metadata catalog or Data Profiling.

      However, the internal SAP HANA DB is not exposed externally and cannot be used to store your own data.

      Author's profile photo Michael Warwas
      Michael Warwas

       

      OK, good to know! 🙂 Is there any technical documentation you can also share?

      Author's profile photo Florian Weigold
      Florian Weigold
      Blog Post Author

      Please wait for General Availability. As for all SAP Products, documentation is available at SAP Help: https://help.sap.com/viewer/p/SAP_DATA_HUB

      Author's profile photo Michael Warwas
      Michael Warwas

      OK, will do so. Last question.... Any idea about the date for GA?

      Author's profile photo Florian Weigold
      Florian Weigold
      Blog Post Author

      Hi Michael

      It is GA since yesterday.

      Release note:
      https://launchpad.support.sap.com/#/notes/2621247

      SAP Help:
      https://help.sap.com/viewer/p/SAP_DATA_HUB

      PAM:

      https://apps.support.sap.com/sap/support/pam?hash=s%3DData%2520Hub%26o%3Dmost_viewed%257Cdesc%26st%3Dl%26rpp%3D20%26page%3D1%26pvnr%3D73554900100900002861%26pt%3Dg%257Cd

       

      Best regards

      Florian