SAP Datahub is an unique offering - data orchestration redefined with a flow based design paradigm (Data Pipelines) ,containerized software, and automated fast deployment, scaling, management on kubernetes clusters. Features such as Meta data explorer,Data profiling, catalogue search, Data modeling (Pipelines) lets customers take control of data sets on varied data sources like HDFS, Oracle, DB2,MYSQL,MSSQL,SAP ABAP, SAP Data services, Azure Data Lake (ADL), GCP - Big Query, S3 etc
Currently there are close to 290 predefined Operators to incorporate in a data pipeline they include but are not restricted to : Data Modeling, Data workflow, Kafka, Tensor flow, R, Python,spark submit, HANA client, Write File (HDFS/S3) etc plus support to build customized operators (e.g using own python code) and save then as a docker image, reuse them in several pipelines and scale it massively on Kubernetes clusters.
In this blog I'm going to discuss the underlying technology and implementation aspects of SAP datahub on AWS EKS (Managed K8S in cloud) using some recent experiences from a proof of concept. The installation steps are covered in detail in the SAP guides but this blog is more a real life experience on running SAP datahub on AWS including the steps to expose the datahub UI on a public AWS Route 53 registered domain.
Containers: What is a Container ? Containers are a way of running multiple isolated applications on the same hardware (just like VM's) but it does that using virtual memory on the host and by using the hosts underlying OS for providing basic services to all containers. (VM's on the other had have their own OS), so its basically OS abstraction. As there is no dedicated OS (only virtual memory in play) think of it as a way of running 1000's of services with very less startup time and ability to scale massively.Another important point ,Containers mark a paradigm shift in how applications are packaged and delivered - Development and Operations together, which means that while we are developing our applications we have to be first fundamentally aware of the OS environment, libraries, kernel i.e your *dependencies (* not leaving it for Operations to figure out during deployment) and then develop your application code on top to create an image which when deployed can by itself deploy its dependencies and create a scalable deployment scenario. An example of one such application (among many others in the modern world like Google Docs) is SAP Datahub.
Container Runtime: For running containers on an OS , we need a container runtime which enables containers (e.g. providing a network for Containers to communicate) and providing features such as image management(i.e. the software images which executes in a Container). Good example of a Container Runtime is Docker (Ansible as well from RedHat). When Docker installs on an OS, you would see a network interface docker0 which enables contianer networking, cgroups are used to restricit CPU resources per container. Lets understand from SAP Datahub's Context, each component of SAP datahub like internal HANA DB, Engines like DQP, Flow graph (pipelines), text, graph,consul, distributed log etc run as containers (deployed as a docker image), and one or more such containers running together constitutes a service or functionality, so we also need an environment on top to orchestrate , manage, scale and expose the endpoints of various services in containers running on several nodes - This platform is Kubernetes (Container Orchestration).
Container Registry : Its the place where docker (or the container runtime) stores all images, SAP Stores the relevant docker images in SAP Public repository, when we install datahub on-premise or in cloud AWS/Azure/GCP these images are pulled from SAP Public repositories to the local repositories that the customer sets up. AWS provides Amazon Elastic Container Registry (ECR) for this purpose and has to be setup separately before the SAP datahub install is started. For On-premise customers a local docker registry can be configured( it can be internet facing or otherwise images can be downloaded separately and pushed to the local registry).
Python and Base YAML packages : Some bases packages for Python(python 2.7) and Pyyaml are required as pre-requisites.( check the install guide and the install steps below)
Kubernetes (K8S) : K8S is a container orchestration environment.Think of it as an environment used for managing containers across various nodes. K8s gives rise to the concept of “PODS” - a set of containers working together , having their own network and storage. K8s creates an overlay network (over docker0 we discussed before) to allow PODs to communicate with each other , Kube-DNS helps resolving the names of various services exposed by pods. Google, AWS and Azure all now have managed K8s in cloud.
HELM / TILLER : K8S Package manager, to Deploy packages in K8S. Tiller is the server component and runs as a POD and helm is the client side component(helm command on linux).We will cover some steps to configure it later.
Below example is just a food for thought , and maybe not very correct way of solving the problem but just wanted to highlight the benefits.Imagine multiple users in a large Enterprise working on plethora of data that the enterprise generates and the need to get their flavor of data from this ever changing data siloed across several data sources. A data pipeline can work work well to integrate such data and provide real time insights as the data changes, and now imagine 1000's of users in the enterprise running pipeline (inadvertently spinning their own piplelines- containers) , so we have 1000 instances and various containers in action.
So, Lets Run SAP Datahub on Amazon Web Services EKS , I'm going to focus more on the SAP Datahub installation part on AWS EKS , for more details on how to provision an EKS instance please refer the AWS guide : https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html
Pre-requisites : Get K8S(AWS EKS) up and running (We are going to use 3 EC2 nodes)
Create an IAM role for the EKS Cluster as below :
Create Cluster --> Pre-requisite a VPC should exist:
Add worker nodes once the status of the Cluster is active , add additional EKS nodes :https://docs.aws.amazon.com/eks/latest/userguide/launch-workers.html
Configure Security groups
Please allow inbound communication between EKS nodes on Internal IP's for the K8S network to work.
Checking the EKS K8S and configuring the setup for Datahub deployment:
Login to the master node and use "aws configure" to connect to the EKS user which was created before as the cluster admin :
First login to IAM and generate access key for the EKS admin user:
[ec2-user@ip-~]$ aws configure AWS Access Key ID [****************3NMQ]: AWS Secret Access Key [****************SbIi]: Default region name [ap-northeast-2]: Default output format [json]:
Download and configure Kubectl : https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html
Check the Kubernetes enviroment:
[ec2-user@ ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-<>.ap-northeast-2.compute.internal Ready <none> 23d v1.10.11 ip-<>.ap-northeast-2.compute.internal Ready <none> 24d v1.10.11 ip-<>.ap-northeast-2.compute.internal Ready <none> 24d v1.10.11
Check the existing K8S PODS :
[ec2-user@<> ~]$ kubectl -n kube-system get pods NAME READY STATUS RESTARTS AGE aws-node-292gw 1/1 Running 2 24d aws-node-5d8wt 1/1 Running 2 24d aws-node-whz2n 1/1 Running 1 23d heapster-7ff8d6bf9f-lsk5x 1/1 Running 0 23d kube-dns-5579bfd44-6qpxd 3/3 Running 0 23d kube-proxy-4g6qf 1/1 Running 0 23d kube-proxy-chmgh 1/1 Running 1 24d kube-proxy-sdq7k 1/1 Running 2 24d kubernetes-dashboard-669f9bbd46-ccp8m 1/1 Running 0 23d monitoring-influxdb-cc95575b9-24qxg 1/1 Running 0 23d
Check Kube-DNS is configured correctly , A typical error which means the KUBE DNS is not able to resolve logical names in the K8S network, this can be a problem with they way the K8s was provisioned or with the VPC security groups , this issue can cause problems later when we deploy datahub : https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
root@wg-node1:~# kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.100.0.10
Address 1: 10.100.0.10
nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
kubectl exec -ti busybox -- nslookup kubernetes.default Server: 10.100.0.10 Address 1: 10.100.0.10 kube-dns.kube-system.svc.cluster.local Name: kubernetes.default Address 1: 10.100.0.1 kubernetes.default.svc.cluster.local
redeploy the cluster if needed or check the connection between nodes works fine.
Download HELM and deploy TILLER:
helm should be available (just type helm on the OS) if not please download the suitable version for your distribution from : https://github.com/helm/helm
[ec2-user@<> linux-amd64]$ kubectl version Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.11", GitCommit:"637c7e288581ee40ab4ca210618a89a555b6e7e9", GitTreeState:"clean", BuildDate:"2018-12-06T02:30:38Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.11-eks", GitCommit:"6bf27214b7e3e1e47dce27dcbd73ee1b27adadd0", GitTreeState:"clean", BuildDate:"2018-12-04T13:33:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} download Helm Archive on OS and Extract: [ec2-user@<> ~]$ ls -lrt|grep helm -rw-rw-r-- 1 ec2-user ec2-user 30054400 Jan 29 06:50 helm-v2.9.1-linux-amd64.tar [ec2-user@<> ~]$ cd linux-amd64 [ec2-user@<> linux-amd64]$ ls -lrt total 29348 -rwxr-xr-x 1 ec2-user ec2-user 30033696 May 14 2018 helm -rw-r--r-- 1 ec2-user ec2-user 3310 May 14 2018 README.md -rw-r--r-- 1 ec2-user ec2-user 11373 May 14 2018 LICENSE copy it to the bin locations under your $PATH env: [ec2-user@<> linux-amd64]$ which helm /usr/local/bin/helm
Create Cluster Role binding and start Tiller (Runs as a POD, we should be able to see it running later)
$kubectl create serviceaccount --namespace kube-system tiller $ kubectl create clusterrolebinding tiller --clusterrole=cluster-admin --serviceaccount=kube-system:default $ helm init --service-account tiller ##additional Step : I had to create this additional clusterrole binding as the dtaahub install later gave an error root@ip-10-0-1-82:~# kubectl create clusterrolebinding tiller-clusterrule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller clusterrolebinding.rbac.authorization.k8s.io "tiller-clusterrule" created
[ec2-user@<> linux-amd64]$ kubectl -n kube-system get pods NAME READY STATUS RESTARTS AGE aws-node-292gw 1/1 Running 2 24d aws-node-5d8wt 1/1 Running 2 24d aws-node-whz2n 1/1 Running 1 23d heapster-7ff8d6bf9f-lsk5x 1/1 Running 0 23d kube-dns-5579bfd44-6qpxd 3/3 Running 0 23d kube-proxy-4g6qf 1/1 Running 0 23d kube-proxy-chmgh 1/1 Running 1 24d kube-proxy-sdq7k 1/1 Running 2 24d kubernetes-dashboard-669f9bbd46-ccp8m 1/1 Running 0 23d monitoring-influxdb-cc95575b9-24qxg 1/1 Running 0 23d prodding-walrus-kube-lego-85467b8dd4-vqxm7 1/1 Running 0 12d tiller-deploy-f9b8476d-wnmgj 1/1 Running 0 23d
Configure AWS ECS : Elastic Container Registry : Remember we discussed in our description of datahub above that we need a Container registry to store various images of software, during the SAP datahub install we would be pulling software images form SAP Public repository to the ECS registry.
[ec2-user@ip-<> linux-amd64]$ aws ecr get-login --no-include-email docker login -u AWS -p eyJwYXlsb2FkIjoiMlg4eXFmT050N3JlN29ReUovenNWWWxqU0d3ZkFEeElkYXRlZFBMMlJKWk9RNjE3WTRiMmRJOUVYMVRuV0tRUk9CcjNMN3B0S2tiU0syQi9iMGF2Fqa200emNiNWVNMDFWaEdpNVdoTGNncG9BbkdCd2FZRjVEdFFPM2M1Ry9qeEJVTzJ3ZE5xV1ZaQ24vRGlOSyttMVlaV04xaUJoODZncmFPUGlCcmU1SkRlMUc4T25MNUJFQ0txZ . ... . . . . . . . . ##Run the complete putput of the above command : [ec2-user@<> linux-amd64]$ docker login -u AWS -p eyJwYXlsb2FkIjoiMlg4eXFmT050N3JlN29ReUovenNWWWxqU0d3ZkFEeElkYXRlZFBMMlJKWk9RNjE3WTRiMmRJOUVYMVRuV0tRUk9CcjNMN................ https://<>.dkr.ecr.ap-northeast-2.amazonaws.com WARNING! Using --password via the CLI is insecure. Use --password-stdin. WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded Create below repositories : aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vora-dqp aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vora-dqp-textanalysis aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/spark-datasourcedist aws ecr create-repository --repository-name=com.sap.hana.container/base-opensuse42.3-amd64 aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vora-deployment-operator aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/security-operator aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/init-security aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/uaa aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/opensuse-leap aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vsystem-vrep aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vsystem aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vsystem-auth aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vsystem-teardown aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vsystem-module-loader aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/app-base aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/flowagent aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/app-base aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vora-license-manager aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vsystem-shared-ui aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vsystem-ui aws ecr create-repository --repository-name=com.sap.datahub.linuxx86_64/vsystem-voraadapter aws ecr create-repository --repository-name=elasticsearch/elasticsearch-oss aws ecr create-repository --repository-name=fabric8/fluentd-kubernetes aws ecr create-repository --repository-name=grafana/grafana aws ecr create-repository --repository-name=kibana/kibana-oss aws ecr create-repository --repository-name=google_containers/kube-state-metrics aws ecr create-repository --repository-name=nginx aws ecr create-repository --repository-name=prom/alertmanager aws ecr create-repository --repository-name=prom/node-exporter aws ecr create-repository --repository-name=prom/prometheus aws ecr create-repository --repository-name=prom/pushgateway aws ecr create-repository --repository-name=consul aws ecr create-repository --repository-name=nats-streaming aws ecr create-repository --repository-name=vora/hello-world
Goto to the ECS page from AWS management console :
As we run through the SAP datahub install steps we would see these repositories populating with the images.
Setup Environment variables for NAMESPACE and Container Registry:
check the URI of the ECS registry from the above step and configure the below environment variables. NAMESPACE is the namespace under which all our SAP datahub PODS/containers will be deployed. We would be able to visualize this in a bettwe way with K8S Dadhboard.
[ec2-user@<> linux-amd64]$ export NAMESPACE=datahub24 [ec2-user@<> linux-amd64]$ export DOCKER_REGISTRY=3*******.dkr.ecr.ap-northeast-2.amazonaws.com
DEPLOY K8S Dashoard and access it by opening a tunnel from local laptop, its quite important to have it running before deploying datahub as its very useful for troubleshooting.
K8S dashbaord deployed using AWS documentation https://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html
Here is how to run it :
[ec2-user@<> linux-amd64]$ kubectl proxy Starting to serve on 127.0.0.1:8001
on your laptop open a tunnel :
Now access the dashboard : http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/lo...
Generate a Token to login:
[ec2-user@<> ~]$ kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}') Name: eks-admin-token-cmljn Namespace: kube-system Labels: <none> Annotations: kubernetes.io/service-account.name=eks-admin kubernetes.io/service-account.uid=dca7931a-23a6-11e9-808f-0ae755ba91d4 Type: kubernetes.io/service-account-token Data ==== token: <use the token from here> ca.crt: 1025 bytes namespace: 11 bytes
Create a Storage class : If the EKS cluster was provisioned without a storage class, add a storage class as per : https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html
Make sure yo uhave a default annotation if gp2 storage for example is the default storagre
[ec2-user@<> ~]$ kubectl get storageclass NAME PROVISIONER AGE eks-cluster-sc kubernetes.io/aws-ebs 24d gp2 (default) kubernetes.io/aws-ebs 23d vrep-datahub24 sap.com/vrep 23d
vrep-datahub24 : this storage class was created automatically during the install and its basically used for different piplelines
SAP DATAHUB Installation : I'm using he manual install procedure (alternate install procedure)
Download the SAP datahub software archive from SAP market place for Datahub 2.4: DHFOUNDATION04_0-80004015.ZIP
[ec2-user@<> ~]$ ls -lrt|grep SAPData drwxrwxr-x 8 ec2-user ec2-user 189 Jan 30 04:43 SAPDataHub-2.4.63-Foundation [ec2-user@<> ~]$ cd SAPDataHub-2.4.63-Foundation [ec2-user@<> SAPDataHub-2.4.63-Foundation]$ ls -lrt total 128 -rw-r--r-- 1 ec2-user ec2-user 1717 Dec 3 19:05 license_agreement -rwxr-xr-x 1 ec2-user ec2-user 116588 Dec 3 19:05 install.sh -rw-r--r-- 1 ec2-user ec2-user 2953 Dec 3 19:05 dev-config.sh.tpl drwxrwxr-x 6 ec2-user ec2-user 93 Jan 29 09:09 slplugin drwxrwxr-x 7 ec2-user ec2-user 106 Jan 29 09:09 validation drwxrwxr-x 2 ec2-user ec2-user 279 Jan 29 09:09 tools drwxrwxr-x 9 ec2-user ec2-user 167 Jan 30 04:19 logs drwxrwxr-x 6 ec2-user ec2-user 74 Jan 30 04:25 deployment
I had no special conditions to pass to the install prompt but depending on the case please refer the instillation guide , a single Install command does it all, i was prompted to enter my SAP OSS credentials to authorize the download of the Software.
[root@<> SAPDataHub-2.4.63-Foundation]# ./install.sh 2019-01-30T04:19:41+0000 [INFO] Running pre-flight checks 2019-01-30T04:19:41+0000 [INFO] Checking if python2.7 is installed...OK! 2019-01-30T04:19:41+0000 [INFO] Checking if pyyaml is installed...OK! 2019-01-30T04:19:41+0000 [INFO] Checking kubectl configuration...OK! 2019-01-30T04:19:41+0000 [INFO] Checking kubernetes client is 1.9.x or 1.10.x or 1.11.x...OK! 2019-01-30T04:19:41+0000 [INFO] Checking kubernetes version is 1.9.x or 1.10.x or 1.11.x...OK! 2019-01-30T04:19:41+0000 [INFO] Checking whether namespace exists or possible to create...OK! 2019-01-30T04:19:42+0000 [INFO] Trying to access SAP Container Artifactory by pulling an image... Error response from daemon: Get https://73554900100900002861.dockersrv.repositories.sap.ondemand.com/v2/com.sap.datahub.linuxx86_64/hello-sap/manifests/1.1: no basic auth credentials You need to login to access SAP Container Artifactory. 1) Login with Technical User 2) Login with S-User Please select one of the login options: 2 Please provide username: I338450 Please provide password: WARNING! Your password will be stored unencrypted in /root/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded 2019-01-30T04:20:01+0000 [INFO] Checking docker.io access...OK! 2019-01-30T04:20:06+0000 [INFO] Checking push to internal docker registry...OK! 2019-01-30T04:20:06+0000 [INFO] Checking pull from internal docker registry...OK! 2019-01-30T04:20:07+0000 [INFO] Checking helm client is 2.8.x, 2.9.x, 2.10.x or 2.11.x...OK! 2019-01-30T04:20:07+0000 [INFO] Checking if helm installed and initialized...OK! 2019-01-30T04:20:10+0000 [INFO] Wait until tiller is ready... 2019-01-30T04:20:10+0000 [INFO] Checking if helm is ready...OK! 2019-01-30T04:20:10+0000 [INFO] Checking if there is no failed helm chart in the namespace...OK! 2019-01-30T04:20:10+0000 [INFO] Checking if there are any existing persistent volume claims...OK! 2019-01-30T04:20:11+0000 [INFO] End of pre-flight checks
The install will progress from here on and download the images in the ECR address updated using the environment variables: Inputs needed
By running the SAP Data Hub installer and by using built-in operators of SAP Data Hub, Docker images will be built by automatically downloading and installing: (a) Docker images from SAP Docker Registry; (b) Docker images from third party registries; and (c) open source prerequisites from third party open source repositories. The Docker images from the SAP Docker Registry are part of the SAP Data Hub product. Use of these images is governed by the terms of your commercial agreement with SAP for the SAP Data Hub. The Docker images from third party registries and open source prerequisites from third party open source repositories (collectively, the “Third Party Prerequisites”) are prerequisites of SAP Data Hub that usually would have to be downloaded and installed by customers from third party repositories before deploying the SAP Data Hub. For the customers' convenience, the SAP Data Hub installer and built-in operators automatically download and install the Third Party Prerequisites on behalf of the customer. The Third Party Prerequisites are NOT part of the SAP Data Hub and SAP does not accept any responsibility for the Third Party Prerequisites, including, providing support. Use of the Third Party Prerequisites is solely at customers’ risk and subject to any third party licenses applicable to the use of such prerequisites. Customers are responsible for keeping the Third Party Prerequisites up-to-date, and are asked to make use of the respective community support and / or to consider commercial support offerings. The Third Party Prerequisites and associated license information are listed in the Release Note for SAP Data Hub that is published at the download site for SAP Data Hub. I authorize the download and installation of Docker images from the SAP Docker Registry and Third Party Prerequisites from third party repositories, and acknowledge the foregoing disclaimer. (yes/no) yes .... ...... ....... No SSL certificate has been provided via the --provide-certs parameter. The SAP Data Hub installer will generate a self-signed certificate for TLS and JWT. Please enter the SAN (Subject Alternative Name) for the certificate, which must match the fully qualified domain name (FQDN) of the Kubernetes node to be accessed externally: ap-northeast-2.compute.amazonaws.com SAP Data Hub System Tenant Administrator Credentials Provide a password for the "system" user of "system" tenant. The password must have 8-255 characters and must contain lower case, upper case, numerical and special characters . @ # $ %% * - + _ ? ! Please enter a password for "system" user of "system" tenant: Please reenter your password: SAP Dat Hub Initial Tenant Administrator Credentials Provide a username and password for administrator user of "default" tenant. The username must have at least 4 and at most 60 characters Allowed characters: alphabetic(only lowercase), digits and hyphens Username is not allowed to begin/end with hyphens and cannot contain multiple consecutive hyphens Please enter a username for default tenant: vora Do you want to use the same "system" user password for "vora" user of "default" tenant? (yes/no) yes Do you want to configure security contexts for Hadoop/Kerberized Hadoop? (yes/no) no 2019-01-30T04:24:32+0000 [INFO] Configuring contexts with: python2.7 configure_contexts.py -a -n --set Vora_JWT_Issuer_NI.default --set Vora_Default_TLS_Configuration_NI.default secret "vora.conf.secop.contexts" created secret "vora.conf.secop.contexts" labeled 2019-01-30T04:24:33+0000 [INFO] Vora streaming tables require Vora's checkpoint store\n Enable Vora checkpoint store? (yes/no) no ###### Configuration Summary ####### installer: AUDITLOG_MODE: production CERT_DOMAIN: ap-northeast-2.compute.amazonaws.com CHECKPOINT_STORE_TYPE: '' CHECKPOINT_STORE_TYPE_RAW: '' CLUSTER_HTTPS_PROXY: '' CLUSTER_HTTP_PROXY: '' CLUSTER_NO_PROXY: '' CONSUL_STORAGE_CLASS: '' CUSTOM_DOCKER_LOG_PATH: '' DIAGNOSTIC_STORAGE_CLASS: '' DISABLE_INSTALLER_LOGGING: '' DISK_STORAGE_CLASS: '' DLOG_STORAGE_CLASS: '' DOCKER_REGISTRY: 361176622288.dkr.ecr.ap-northeast-2.amazonaws.com ENABLE_CHECKPOINT_STORE: 'false' ENABLE_DIAGNOSTIC_PERSISTENCY: 'no' ENABLE_NETWORK_POLICIES: 'no' ENABLE_RBAC: 'yes' ENABLE_UAA: 'true' HANA_STORAGE_CLASS: '' IMAGE_PULL_SECRET: '' PACKAGE_VERSION: 2.4.63 PV_STORAGE_CLASS: '' TILLER_NAMESPACE: '' USE_K8S_DISCOVERY: 'yes' VALIDATE_CHECKPOINT_STORE: '' VFLOW_IMAGE_PULL_SECRET: '' VFLOW_REGISTRY: '' VORA_ADMIN_USERNAME: vora VORA_FLAVOR: '' VORA_VSYSTEM_DEFAULT_TENANT_NAME: default VSYSTEM_LOAD_NFS_MODULES: 'yes' VSYSTEM_STORAGE_CLASS: '' ###################################### Do you want to start the installation with this configuration (yes/no) yes 2019-01-30T04:24:43+0000 [INFO] Updating installer configuration... secret "installer-config" created secret "installer-config" labeled OK! 2019-01-30T04:24:44+0000 [INFO] Image already exists on the target registry "361176622288.dkr.ecr.ap-northeast-2.amazonaws.com" as "com.sap.hana.container/base-opensuse42.3-amd64:2.03.031.00-3.1.0", will not mirror 2019-01-30T04:24:44+0000 [INFO] Image already exists on the target registry "361176622288.dkr.ecr.ap-northeast-2.amazonaws.com" as "consul:0.9.0", will not mirror 2019-01-30T04:24:45+0000 [INFO] Image already exists on the target registry "361176622288.dkr.ecr.ap-northeast-2.amazonaws.com" as "com.sap.datahub.linuxx86_64/vora-dqp:2.4.46", will not mirror 2019-01-30T04:24:45+0000 [INFO] Image already exists on the target registry "361176622288.dkr.ecr.ap-northeast-2.amazonaws.com" as "com.sap.datahub.linuxx86_64/vora-deployment-operator:2.4.46", will not mirror ..... .. ..............
Note in the screenshot above, I had already run the install once before and hence the docker images were already downloaded.
Keep monitoring the PODS spinning up in K8S dashboard and check for example if the Persitent volume claims on the storage volumes are going through well or if there are certain PODS failing with errors , if such is the case, click on the LOGS to see the error messages on each such failing pods. Example below , see the HANA POD:
Launch a shell inside a POD --> click on the POD --> from right hand top select "EXEC" :
At the End of installation , a validation check is done on Vora-cluster, sparkonk8s etc.
2019-01-30T04:36:03+0000 [INFO] Labelling persistent volume claims... persistentvolumeclaim "data-log-hana-0" labeled persistentvolumeclaim "data-vora-disk-0" labeled persistentvolumeclaim "data-vora-dlog-0" labeled persistentvolumeclaim "data-vora-dlog-1" labeled persistentvolumeclaim "data-vora-dlog-2" labeled persistentvolumeclaim "datadir-vora-consul-0" labeled persistentvolumeclaim "datadir-vora-consul-1" labeled persistentvolumeclaim "datadir-vora-consul-2" labeled persistentvolumeclaim "trace-hana-0" labeled persistentvolumeclaim "layers-volume-vsystem-vrep-0" labeled 2019-01-30T04:36:05+0000 [INFO] Validating... 2019-01-30T04:36:05+0000 [INFO] Running validation for vora-cluster...OK! 2019-01-30T04:37:59+0000 [INFO] Running validation for vora-sparkonk8s...OK! 2019-01-30T04:38:41+0000 [INFO] Running validation for vora-vsystem...OK! 2019-01-30T04:38:48+0000 [INFO] Running validation for vora-diagnostic...OK! 2019-01-30T04:43:54+0000 [INFO] Running validation for datahub-app-base-db...OK!
Once the install finishes Verify the various PODS are running under the datahub24 namespace:
[ec2-user@<> ~]$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE datahub24 audit-log-viewer-5sd29-97d8b6dc7-tjkg4 3/3 Running 0 2d datahub24 audit-log-viewer-pkqjv-85bdcf9867-s7vwm 3/3 Running 0 2d datahub24 auditlog-787bfc46f8-w8j9j 1/1 Running 0 23d datahub24 connection-management-2klfm-b5d8b9d4c-8gcjg 3/3 Running 0 12d datahub24 connection-management-mtknw-5bc4cf5c84-ftjpx 3/3 Running 0 22d datahub24 connection-management-xws79-5b44778855-hqb6m 3/3 Running 0 12d datahub24 datahub-app-db-f2xp5-5bd9dd64bb-6ll8f 3/3 Running 0 23d datahub24 datahub-app-db-kl85c-7dc757c7d5-n4cb4 3/3 Running 0 23d datahub24 diagnostics-elasticsearch-0 0/1 CrashLoopBackOff 6390 23d datahub24 diagnostics-elasticsearch-retention-policy-6498488546-8st5n 1/1 Running 0 23d datahub24 diagnostics-fluentd-76jvh 1/1 Running 0 6d datahub24 diagnostics-fluentd-bh7zx 1/1 Running 0 23d datahub24 diagnostics-fluentd-qdr77 1/1 Running 0 23d datahub24 diagnostics-grafana-6cc75dfdd-gh5hp 2/2 Running 0 23d datahub24 diagnostics-kibana-6d789c9789-nd5bj 2/2 Running 0 23d datahub24 diagnostics-prometheus-kube-state-metrics-6c4b796879-kbkbg 1/1 Running 0 23d datahub24 diagnostics-prometheus-node-exporter-dqhxb 1/1 Running 0 23d datahub24 diagnostics-prometheus-node-exporter-hp8n6 1/1 Running 0 23d datahub24 diagnostics-prometheus-node-exporter-lqns7 1/1 Running 0 23d 0 6d datahub24 diagnostics-prometheus-pushgateway-58bb88c4d6-c6qxl 2/2 Running 0 6d 0 6d datahub24 diagnostics-prometheus-server-0 1/1 Running 0 2d datahub24 flowagent-5cs77-77b848f7b6-tl9lh 3/3 Running 0 2d datahub24 flowagent-nzp85-cd7855484-p762j 3/3 Running 0 2d datahub24 hana-0 1/1 Running 0 23d datahub24 internal-comm-secret-gen-8wgpd 0/1 Completed 0 23d datahub24 launchpad-ddppk-55c669755c-2rgzt 3/3 Running 0 12d datahub24 launchpad-h9gqt-7cdb6dbcd8-fdpqt 3/3 Running 0 12d datahub24 launchpad-hpvfp-64fc44dc89-9n465 3/3 Running 0 22d datahub24 launchpad-js56k-865f9bbff8-wlbvk 3/3 Running 0 12d datahub24 launchpad-jxhsq-787d74cfd4-v8ccs 3/3 Running 0 12d datahub24 license-management-bdtq6-85bd9b88b7-k88rc 3/3 Running 0 23d datahub24 license-management-nbpt9-64bbc799c-zkl87 3/3 Running 0 22d datahub24 metadata-explorer-mnxlg-69768799f-kjkvf 3/3 Running 1 12d datahub24 metadata-explorer-pq6mr-5ddb5dd8d8-x2wmh 3/3 Running 0 22d datahub24 metadata-explorer-rrf5s-84f9686975-qnvzc 3/3 Running 1 12d datahub24 metadata-explorer-z995p-64c485856c-kk8jd 3/3 Running 0 8d datahub24 monitoring-29qx9-8544fb8594-d248s 3/3 Running 0 10d datahub24 monitoring-6tccm-65cdc5555c-t5hsw 3/3 Running 0 2d datahub24 monitoring-pm4hr-84c749f85b-bs9qk 3/3 Running 2 12d datahub24 monitoring-r5pnh-9856594fc-btbk7 3/3 Running 0 22d datahub24 scheduler-glwjq-676bb758fb-ljpd4 3/3 Running 0 22d datahub24 shared-2nj2x-679666c7fd-mnr6q 3/3 Running 0 22d datahub24 system-management-8p7fp-6bb84c4cd-scbsq 3/3 Running 0 12d datahub24 system-management-9kw5g-6787955c57-mktlw 3/3 Running 0 12d datahub24 system-management-g2k7d-9f8894996-lxnsb 3/3 Running 0 22d datahub24 system-management-lm29n-65d6cdd67c-jv8k6 3/3 Running 0 12d datahub24 system-management-pjsts-754dff5c59-9lhl2 3/3 Running 0 12d datahub24 task-application-6rgnp-5d6fb96d8f-fszgz 3/3 Running 0 2d datahub24 uaa-5db6c9d965-kvpzf 2/2 Running 1 9d datahub24 vora-catalog-54b6c76cb9-flw5b 2/2 Running 2 23d datahub24 vora-config-init-jrwh5 0/2 Completed 0 23d datahub24 vora-consul-0 1/1 Running 0 23d datahub24 vora-consul-1 1/1 Running 0 23d datahub24 vora-consul-2 1/1 Running 0 23d datahub24 vora-deployment-operator-c8c848767-ghklm 1/1 Running 0 23d datahub24 vora-disk-0 2/2 Running 0 6d datahub24 vora-dlog-0 2/2 Running 0 23d datahub24 vora-dlog-1 2/2 Running 0 23d datahub24 vora-dlog-2 2/2 Running 0 23d datahub24 vora-dlog-admin-t7znm 0/2 Completed 0 23d datahub24 vora-doc-store-755689c9d4-k77b6 2/2 Running 2 23d datahub24 vora-graph-7bbdcbff78-8g4hd 2/2 Running 2 23d datahub24 vora-landscape-59664c66fc-cqbfd 2/2 Running 1 23d datahub24 vora-nats-streaming-9d44586f8-2ckqr 1/1 Running 0 23d datahub24 vora-relational-fd7fb7d9-qnqmz 2/2 Running 2 23d datahub24 vora-security-operator-946cbf44f-bw8q9 1/1 Running 0 23d datahub24 vora-spark-resource-staging-server-7548d94c9b-kkkgn 1/1 Running 0 23d datahub24 vora-textanalysis-6fc6db7f9b-7qb27 1/1 Running 0 23d datahub24 vora-time-series-98c95d7-8m9gl 2/2 Running 1 23d datahub24 vora-tools-225cq-7dc4c9cfb5-sq5s6 3/3 Running 0 10d datahub24 vora-tools-kmvwr-559dbb64c4-b29xw 3/3 Running 0 22d datahub24 vora-tools-rhkb9-575dbdfc85-brv52 3/3 Running 0 12d datahub24 vora-tx-broker-6864547954-6lm2d 2/2 Running 2 23d datahub24 vora-tx-coordinator-6544b9bfd6-pcsjg 2/2 Running 1 23d datahub24 vora-tx-lock-manager-5c8699cb6-mhmk9 2/2 Running 2 23d datahub24 voraadapter-gq4dt-669f4b4944-r54f8 3/3 Running 0 23d datahub24 voraadapter-pdkgm-56fb9f466d-kh5lv 3/3 Running 0 23d datahub24 vsystem-fdcf96cbf-mbhn8 2/2 Running 0 23d datahub24 vsystem-module-loader-bbprd 1/1 Running 0 23d datahub24 vsystem-module-loader-c57x6 1/1 Running 0 23d datahub24 vsystem-module-loader-v5fqt 1/1 Running 0 23d datahub24 vsystem-vrep-0 1/1 Running 0 23d default busybox 1/1 Running 575 23d ingress-nginx nginx-ingress-controller-75cd585bbc-sjxf4 1/1 Running 0 13d kube-system aws-node-292gw 1/1 Running 2 25d kube-system aws-node-5d8wt 1/1 Running 2 25d kube-system aws-node-whz2n 1/1 Running 1 23d kube-system heapster-7ff8d6bf9f-lsk5x 1/1 Running 0 23d kube-system kube-dns-5579bfd44-6qpxd 3/3 Running 0 23d kube-system kube-proxy-4g6qf 1/1 Running 0 23d kube-system kube-proxy-chmgh 1/1 Running 1 25d kube-system kube-proxy-sdq7k 1/1 Running 2 25d kube-system kubernetes-dashboard-669f9bbd46-ccp8m 1/1 Running 0 23d kube-system monitoring-influxdb-cc95575b9-24qxg 1/1 Running 0 23d kube-system prodding-walrus-kube-lego-85467b8dd4-vqxm7 1/1 Running 0 12d kube-system tiller-deploy-f9b8476d-wnmgj 1/1 Running 0 23d
Access Datahub application : Datahub application is exposed by vsystem Service, Open K8s dashboard and select your namespace(datahub24 in my case) goto "Discovery and Load Balancing" --> services --> find vsystem and the port -->
Open a ssh tunnel from your laptop to this port 31855 in the above case:
C:\Users\i338450>ssh -L 31855:localhost:31855 -i EMT_KEY.pem ec2-user@13.125.144.154
Access the application on : https://localhost:31855
I had registered Domain on AWS : datahubpoc.com beforehand, so wanted to expose the Datahub vsystem App endpoint as https://poc1.<..>.com. So first create a Domain using route 53, once this is done we would be able to add the CNAME = poc1.<...>.com pointing to the Ingress IP address (Follow the steps below):
As per the installation guide, we first generate an open ssl certificate and expose it as a secret in K8S :
[root@<> ec2-user]# openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=poc1.datahubpoc.com" Generating a 2048 bit RSA private key ..+++ .........................................................................................+++ writing new private key to '/tmp/tls.key' ----- [root@<>-user]# kubectl -n datahub24 create secret tls vsystem-tls-certs --key /tmp/tls.key --cert /tmp/tls.crt secret "vsystem-tls-certs" created [root@<> ec2-user]# kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/aws/service-l4.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/aws/patch-configmap-l4.yaml namespace "ingress-nginx" created configmap "nginx-configuration" created configmap "tcp-services" created configmap "udp-services" created serviceaccount "nginx-ingress-serviceaccount" created clusterrole.rbac.authorization.k8s.io "nginx-ingress-clusterrole" created role.rbac.authorization.k8s.io "nginx-ingress-role" created rolebinding.rbac.authorization.k8s.io "nginx-ingress-role-nisa-binding" created clusterrolebinding.rbac.authorization.k8s.io "nginx-ingress-clusterrole-nisa-binding" created deployment.apps "nginx-ingress-controller" created
We would eventually need TLS/SSL on our endpoint https://poc1.<..>.com , this can be easily done with Free Certificate authority LetsEncrypt : more details : https://docs.bitnami.com/kubernetes/how-to/secure-kubernetes-services-with-ingress-tls-letsencrypt/
For https we have to install kube-lego chart , email address is mandatory, kube-lego will pick up the change to the Ingress object, request the certificate from LetsEncrypt and store it in the “vsystem-tls-cert” Secret.(which we would be creating subsequently) In turn, the NGINX Ingress Controller will read the TLS configuration and load the certificate from the Secret. Once the NGINX server is updated, a visit to the domain in the browser should present the datahub App site over a secure TLS connection.
[ec2-user@<> tmp]$ helm install stable/kube-lego --namespace kube-system --set config.LEGO_EMAIL=<Enter Your Email>,config.LEGO_URL=https://acme-v01.api.letsencrypt.org/directory NAME: prodding-walrus LAST DEPLOYED: Sat Feb 9 20:29:47 2019 NAMESPACE: kube-system STATUS: DEPLOYED RESOURCES: ==> v1beta1/Deployment NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE prodding-walrus-kube-lego 1 0 0 0 0s ==> v1/Pod(related) NAME READY STATUS RESTARTS AGE prodding-walrus-kube-lego-85467b8dd4-vqxm7 0/1 Pending 0 0s NOTES: This chart installs kube-lego to generate TLS certs for Ingresses. NOTE: > kube-lego is in maintenance mode only. There is no plan to support any new > features. The officially endorsed successor is cert-manager > https://hub.kubeapps.com/charts/stable/cert-manager EXAMPLE INGRESS YAML: apiVersion: extensions/v1beta1 kind: Ingress metadata: name: example namespace: foo annotations: kubernetes.io/ingress.class: nginx # Add to generate certificates for this ingress kubernetes.io/tls-acme: 'true' spec: rules: - host: www.example.com http: paths: - backend: serviceName: exampleService servicePort: 80 path: / tls: # With this configuration kube-lego will generate a secret in namespace foo called `example-tls` # for the URL `www.example.com` - hosts: - "www.example.com" secretName: example-tls
We are now ready to deploy the Ingress:
My ingress YAML File: [ec2-user@<> ~]$ cat ingress.yaml apiVersion: extensions/v1beta1 kind: Ingress metadata: annotations: ingress.kubernetes.io/force-ssl-redirect: "true" ingress.kubernetes.io/proxy-body-size: 500m ingress.kubernetes.io/secure-backends: "true" kubernetes.io/ingress.class: nginx kubernetes.io/tls-acme: "true" nginx.ingress.kubernetes.io/backend-protocol: HTTPS nginx.ingress.kubernetes.io/proxy-body-size: 500m nginx.ingress.kubernetes.io/proxy-buffer-size: 16k nginx.ingress.kubernetes.io/proxy-connect-timeout: "30" nginx.ingress.kubernetes.io/proxy-read-timeout: "1800" nginx.ingress.kubernetes.io/proxy-send-timeout: "1800" nginx.ingress.kubernetes.io/ssl-passthrough: "true" name: vsystem spec: rules: - host: <DNS_REPLACE> http: paths: - backend: serviceName: vsystem servicePort: 8797 path: / tls: - hosts: - <DNS_REPLACE> secretName: vsystem-tls-certs
Note Carefully replace <DNS_REPLACE> as below :
[ec2-user@<> ~]$ export dns_domain=poc1.<....>.com [ec2-user@<> ~]$ cat ingress.yaml| sed "s/<DNS_REPLACE>/${dns_domain}/g" | kubectl -n datahub24 apply -f - ingress.extensions "vsystem" created
describe your Ingress :
[ec2-user@<> ~]$ kubectl -n datahub24 describe ingress Name: vsystem Namespace: datahub24 Address: adb19<.......>.ap-northeast-2.elb.amazonaws.com Default backend: default-http-backend:80 (<none>) TLS: vsystem-tls-certs terminates poc1.datahubpoc.com Rules: Host Path Backends ---- ---- -------- poc1.datahubpoc.com / vsystem:8797 (<none>) Annotations: ingress.kubernetes.io/proxy-body-size: 500m nginx.ingress.kubernetes.io/backend-protocol: HTTPS nginx.ingress.kubernetes.io/proxy-connect-timeout: 30 nginx.ingress.kubernetes.io/ssl-passthrough: true nginx.ingress.kubernetes.io/proxy-buffer-size: 16k nginx.ingress.kubernetes.io/proxy-read-timeout: 1800 ingress.kubernetes.io/force-ssl-redirect: true ingress.kubernetes.io/secure-backends: true kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"ingress.kubernetes.io/force-ssl-redirect":"true","ingress.kubernetes.io/proxy-body-size":"500m","ingress.kubernetes.io/secure-backends":"true","kubernetes.io/ingress.class":"nginx","kubernetes.io/tls-acme":"true","nginx.ingress.kubernetes.io/backend-protocol":"HTTPS","nginx.ingress.kubernetes.io/proxy-body-size":"500m","nginx.ingress.kubernetes.io/proxy-buffer-size":"16k","nginx.ingress.kubernetes.io/proxy-connect-timeout":"30","nginx.ingress.kubernetes.io/proxy-read-timeout":"1800","nginx.ingress.kubernetes.io/proxy-send-timeout":"1800","nginx.ingress.kubernetes.io/ssl-passthrough":"true"},"name":"vsystem","namespace":"datahub24"},"spec":{"rules":[{"host":"poc1.datahubpoc.com","http":{"paths":[{"backend":{"serviceName":"vsystem","servicePort":8797},"path":"/"}]}}],"tls":[{"hosts":["poc1.datahubpoc.com"],"secretName":"vsystem-tls-certs"}]}} kubernetes.io/ingress.class: nginx kubernetes.io/tls-acme: true nginx.ingress.kubernetes.io/proxy-body-size: 500m nginx.ingress.kubernetes.io/proxy-send-timeout: 1800 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CREATE 26m nginx-ingress-controller Ingress datahub24/vsystem Normal UPDATE 26m nginx-ingress-controller Ingress datahub24/vsystem
Note AWS has automatically assigned a classic load balance with A-Record : adb<......>.ap-northeast-2.elb.amazonaws.com
Check this at the AWS console under load balances:
Lets now create a CNAME in route 53 and assign poc1.<...>.com to the load balancer DNS name(A record) :
Wait for a few minutes for the DNS change to propagate or if you are too eager change TTL(seconds) to 60 in the record set. Access the application as : https://poc1.<...>.com 🙂
Note that the URL is already https secured as we had kube-lego in the steps above to call LetEncrypt to SSL secure the DNS_NAME defined in the Ingress YAML.
Including once more screenshot, as this is my favorite screen:
Overall its fairly easy to install SAP DATAHUB, the complexity often arises with the use of technologies such as docker , kubernetes but that's more of a learning curve.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
34 | |
25 | |
13 | |
7 | |
7 | |
6 | |
6 | |
6 | |
5 | |
4 |