Skip to Content
Technical Articles
Author's profile photo Thorsten Schneider

Installing SAP Data Hub

In my last blog post I wrote about what it means that SAP Data Hub is a containerized application. Today I want to talk about the installation of SAP Data Hub. All explanations relate to SAP Data Hub 2.3 or newer.

This has a prelude: For SAP TechEd I had submitted a MeetUp about installing SAP Data Hub. I wanted to demonstrate the installation during this MeetUp with a live demo… and unfortunately realized a few hours before that the MeetUp room did not come with a monitor ☹. My improvisation consisted of a few hastily prepared whiteboard drawings. I took these drawings as a basis for this blog post.

There is a lot written about installing SAP Data Hub in the official documentation (in particular here and here). My intention behind this blog post is clearly not to enable you to install SAP Data Hub without consulting the documentation. Instead I like to complement the documentation by looking a bit behind the scenes.

Overview of an installation

Every installation of SAP Data Hub – independent of where you install the software – simply spoken consists of three phases: preparation, installation and post-installation.

The installation phase consists of four sub-phases: pre-flight checks, mirroring, deployment and validation. That’s how it looks:

During the installation you get in touch with different “things”. I have tried to depict them in the following diagram:

Preparation

During the preparation phase you need to set up an installation host (1) to run the installation procedure as well as a Kubernetes cluster (2) and a “local” container registry (3) to install SAP Data Hub through the installation procedure.

Remark: If SAP Data Hub is operated as part of an SAP system landscape, the recommended installation procedure to install SAP Data Hub is using SAP Maintenance Planner (see here). An alternative is a command-line tool (install.sh) delivered with SAP Data Hub. For today, I will use the command line tool. I might write a separate blog post about using SAP Maintenance Planner… or maybe a colleague will do. Behind the scenes always install.sh runs. So, what you learn today will stay valid.

The installation host (1) is a Linux computer / virtual machine. It must meet certain requirements. For example, it needs to have Docker, Python, the Kubernetes command-line tool (kubectl) and the Kubernetes package manager (helm) installed.

The installation procedure for SAP Data Hub assumes that you have a running Kubernetes cluster (2) as well as “local” container registry (3). Just like the installation host, both must meet certain requirements. The Kubernetes cluster (2) needs to consist of at least three nodes (all details can be found here).

Depending on whether you like to install SAP Data Hub on-premise or in the cloud (and on which cloud provider) the steps to spin up the cluster and registry defer. I do not like to bloat this blog post by listing the individual commands.

Installation

After you have prepared for the installation, you install SAP Data Hub. Thereto you download the software archive from the SAP Software Download Center (4) to the installation host (1):

After unpacking the software archive (in this example DHFOUNDATION03_3-80004015.zip, i.e. SAP Data Hub 2.3 patch 3), you find the following folder structure on the installation host (1):

Now the fun begins. You start the installation by running the command line tool (install.sh). The command line tool has two mandatory parameters: the Kubernetes namespace used to deploy SAP Data Hub and the “local” container registry (3). You can start the installation like this:

The command line tool has many more additional parameters. It will later prompt for (some of) these, if you do not pass them.

Pre-Flight Checks

After you have started the installation, the command line tool performs a couple of checks to ensure that the necessary prerequisites to install SAP Data Hub are fulfilled. These checks are supposed to ensure that the installation does not break halfway. They are comparable to the checks a pilot performs prior to takeoff to minimize the risk of a plane crash. Hence SAP’s developers called them “pre-flight checks”:

Subsequently to the pre-flight checks, the command line tool prompts for additional parameters (which you did not pass when calling install.sh). Finally, it asks you to confirm the parameters (aka “configuration”) for the installation:

Mirroring

After you have confirmed the parameters, the command line tool first mirrors the container images for SAP Data Hub. They will later be used to run the different components of SAP Data Hub on the Kubernetes cluster (2).

Mirroring means the command line tool pulls the necessary container images from the (private) SAP container registry (5) as well as from (public) third party container registries (6) to the installation host (1). Afterwards it tags the container images for and pushes them to the “local” container registry (3). “Local” means the container registry which is used by the Kubernetes cluster (2).

The following screenshot shows some of the container images on the installation host (1):

You can see that each container image is listed two times:

  • Once the container image is tagged with the container registry it was pulled from, e.g. the SAP container registry (73554900100200008830.dockersrv.repositories.sap.ondemand.com) or Docker Hub (docker.io).
  • Once it is tagged with the container registry it was pushed to, i.e. the “local” container registry (3) used by the Kubernetes cluster (2). In this example this is the container registry eu.gcr.io/…234664.

The SAP container registry (5) includes the container images for all versions (support packages, patches) of SAP Data Hub. The command line tool is “bound” to one version of SAP Data Hub (in this example SAP Data Hub 2.3 patch 3). All relevant container images are listed in the ./tools/images.sh file inside the software archive downloaded from the SAP Software Download Center (4).

Deployment

After all necessary container images have been mirrored, the command line tool deploys the different components of SAP Data Hub. For this it uses the Kubernetes package manager (helm). At the end of the deployment, all containers needed by SAP Data Hub will run on the Kubernetes cluster (2). The cluster will look similar to this now (the screenshot shows all running containers):

Necessary files for helm are stored in the ./deployment directory inside the software archive downloaded from the SAP Download Center (4).

Validation

Finally, install.sh runs a couple of validations to ensure that SAP Data Hub is functional. The following screenshot shows the output in case all validations are successful:

The validations include:

  • Creation of tables in SAP Vora, execution of several queries (vora-cluster)
  • Execution of smoke tests for Spark (vora-sparkkonk8s)
    Remark: Certain features of SAP Data Hub make use of Spark und run Spark workloads on the Kubernetes cluster.
  • Connection to SAP Data Hub System Management and verification of installed applications, e.g. Connection Management, Metadata Explorer, Vora Tools (vora-vsystem)
  • Verification of the SAP HANA database used by applications like the aforementioned ones (datahub-app-base-db)

You can find the detailed results of the validations in the ./logs folder:

Post-Installation

After you have successfully installed the software, additional post-installation steps can be necessary. Again (just like for the preparation), the steps depend on whether you like to install SAP Data Hub on-premise or in the cloud (and on which cloud provider). And again, I do not like to bloat this blog post by listing the individual commands. If you like to know the details, then you can take a look at the official documentation.

Hooray. SAP Data Hub is running. you can log on with the user / password passed to the command line tool earlier:

That’s all for now. I hope you found this blog post interesting. Next time I will most likely write something about data pipelines and workflows…

Assigned Tags

      34 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Miguel Schaller
      Miguel Schaller

      Hello Thorsten

      Maybe you can help me.

      I want to install Data Hub on a Kubernetes Cluster on a Centos 7 VM.

      But my Hana-0 and vora-*** pods wont start.

      The only error I get from the installation script is: waiting for these pods to become ready.

      And after that the installation is canceld.

      I hope you can help me.

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Miguel,

      how big is your cluster? Is it one VM/node? The hana-0 pod alone requests 20GB of memory. My assumption is that it does not get this memory and the pod hence never gets ready. Do you have a chance to use a bigger cluster?

      Cheers
      Thorsten

      Author's profile photo Miguel Schaller
      Miguel Schaller

      Hello Thorsten

      Thanks for your fast anwser.

      I have 3 VM in my cluster. I upgraded my master server to 64 Gb of Memory, 64 Gb hard drive and 6 cores but the pods still wont start. The other two server have each 16 Gb of Memory, 32 Gb hard drive and 4 cores.

      How much memory do you gave to your VM?

      Cheers

      Miguel

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Miguel,

      the minimum requirement is 3 nodes, 32 GB RAM each (the 3 nodes are the workers only and do not include the master)

      https://help.sap.com/viewer/e66c399612e84a83a8abe97c0eeb443a/2.3.latest/en-US/79724de552db4b2b81c4a893f2c7ed18.html

      I am not saying that it is completely impossible to "tweak" things. But I do not like to recommend things we don't support as per the official documentation. Hope that helps...

      Cheers

      Thorsten

      Author's profile photo Marcus Schiffer
      Marcus Schiffer

      Hi,

       

      I have the same issue, hana-o will not start. My cluster is on azure with 4 nodes 8vcpus each and 32GB RAM each.

      What am I missing ?

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi,

      hard to day without further information. Please check what the logs for the pod say (do a kubectl logs and/or kubectl describe...)

      Cheers
      Thorsten

      Author's profile photo Marcus Schiffer
      Marcus Schiffer

      Hi,

       

      here is some log from kubectl describe.

      Seems to be an authorization issue. I am however not sure which user would cause this error.

      The pre flight checks showed pull / push are ok.

      So what am I missing ?

      Failed to pull image “registry.azurecr.io/com.sap.hana.container/base-opensuse42.3-amd64:2.03.031.00-3.1.0”: rpc error: code = Unknown desc = Error response from daemon: Get https://registry.azurecr.io/v2/com.sap.hana.container/base-opensuse42.3-amd64/manifests/2.03.031.00-3.1.0: unauthorized: authentication required

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Marcus,

      I am not sure, if the pre-flight checks also check the pull from the Azure registry or only from the SAP registry.

      To me it seems either your service principal does not have the necessary roles or the image pull secret is missing.

      Bartosz Jarkowski has written an excellent post how to install Data Hub on Azure: https://blogs.sap.com/2019/01/10/your-sap-on-azure-part-13-install-sap-data-hub-on-azure-kubernetes-service/

      Have you checked this?

      Cheers

      Thorsten

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski

      Thanks Thorsten Schneider for a mention and your kind words about my blog.

      Marcus Schiffer  I also think this is an issue with the Service Principal authorization. In my blog there is a script that should fix it. I don't want to copy it here, but you can easily identify it by looking for "Modify for your environment. The ACR_NAME is the name of your Azure Container".

      Author's profile photo Marcus Schiffer
      Marcus Schiffer

      Hi,

      thanks for the reply.

      I solved this by adding the application ID of the repository to the role contributor in the ACR.

      Now it works and the installation finished.

      Author's profile photo Marcus Schiffer
      Marcus Schiffer

      Now I get a new issue: The datahub runs, but in the pipeline modeller no pipeline runs.

      The error is always:

      cannot connect to docker registry https://registry.azurecr.io: Get https://registry.azurecr.io/v2/: http: non-successful response (status=401 body="{\"errors\":[{\"code\":\"UNAUTHORIZED\",\"message\":\"authentication required\",\"detail\":null}]}\n")
      So there must be some other authorization problem.
      Any help is appreciated.
      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski

      I can't remember exactly, but I think that's the solution:

      https://help.sap.com/viewer/e66c399612e84a83a8abe97c0eeb443a/2.4.0.archive02/en-US/db861eb7aeac41d4be8c60fae7992a8c.html

      Author's profile photo Marcus Schiffer
      Marcus Schiffer

      Hi Bartosz,

       

      seems to be the solution, however I am a bit lost with all these authorization stuff.

      The help tells me to create the secret file with user (that should be the name of the registry)  and PW.

      But I do not see a way to create this user and PW in Azure ACR.

      So where would I get the password from to put it in the secret file?

       

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski

      In Azure Container Registry blade there is an entry in the menu called Access Keys. I think I have used that.

      Author's profile photo Former Member
      Former Member

      Hello Marcus,

      The link mentioned by Bartosz is not working any more . Appreciate if you could help me to point out the solution .

      https://help.sap.com/viewer/e66c399612e84a83a8abe97c0eeb443a/2.4.0.archive02/en-US/db861eb7aeac41d4be8c60fae7992a8c.html

      Regards

      Satish

       

       

      Author's profile photo Tobias Gorhan
      Tobias Gorhan

      Hi Satish,

      I found this and it works

       

      https://help.sap.com/viewer/e66c399612e84a83a8abe97c0eeb443a/2.6.latest/en-US/a1cbbc0acc834c0cbbe443f2e0d63ab9.html

       

      Cheers

       

      Emir

      Author's profile photo Emir Novokmet
      Emir Novokmet

      Hi Marcus,

      how did you solve this authorization issue ?

      I don't have the rights to see the link from sap help desk.

       

      Cheers

      Emir

      Author's profile photo Anton Petrov
      Anton Petrov

      Hi Thorsten,

       

      Thank you for the helpful blog.

      I am trying to install SAP Data Hub and the installer asked me for login credentials to access SAP Docker Artifactory.

      I have tried with my SAP S-Users (including with my SAP Logon to user DMZ stores) unsuccessfully - I got error: [ERROR] Could not login with the provided credentials!

      How I can request credentials for SAP Docker Artifactory access (Technical or S-User)?

       

      Cheers,

      Anton

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Anton,

      do you (or does your company) have a license for SAP Data Hub. Otherwise the download will not work.

      Best regards

      Thorsten

      Author's profile photo Shivani Misra
      Shivani Misra

      Hello Thorsten,

      I am installing SAP Data Hub 2.3. While installing , i am stuck at the mirroring process. Mirroring of one of the images throws me below error :

      Can you please let me know where at the file system level does this downloading and extraction of images take place? As it says no space left on device, we need to find out the directory whether on installation host or the kubernetes cluster .

      Please advice.

      Thanks,
      Shivani

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Shivani,

      I looked at your screenshot and also checked with our team developing the installer. It seems your installation host does not have enough space. You need 10 GB disk and 20 GB for container images

      https://help.sap.com/viewer/e66c399612e84a83a8abe97c0eeb443a/2.4.latest/en-US/40cc1c6cd72546378182f0de584ced05.html

      Hope that helps.

      Br

      Thorsten

      Author's profile photo Shivani Misra
      Shivani Misra

      Hello ,

      I proceeded with the above error but i am getting error in the validation phase.

       

      Do you have any idea ?

      Thanks,

      Shivani

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi,

      have you taken a look at the logs of dqp-validator-job as suggested in the validation log? What do the logs "say"?

      Cheers
      Thorsten

      Author's profile photo Shivani Misra
      Shivani Misra

      Hello Thorsten,

      Below is the logs for the dqp-validator-job

      Running query: SELECT USER_NAME FROM SYS.USERS WHERE USER_NAME='default\vora-admin'; connecting to 10.35.255.244 at port 10002 (10.35.255.244:10002) ... switched to existing session "" query send... error on server response: "could not handle api call, failure reason : :-1, CException, Code: 10021 : Runtime category : an std::exception wrapped. invalid user default\vora-adminNext level: invalid user default\vora-admin " cease to connect to 10.35.255.244:10002... Validating schemas have been created in Vora ... Running query: SELECT * FROM SYS.SCHEMAS connecting to 10.35.255.244 at port 10002 (10.35.255.244:10002) ... switched to existing session "" query send... error on server response: "could not handle api call, failure reason : :-1, CException, Code: 10021 : Runtime category : an std::exception wrapped. invalid user default\vora-adminNext level: invalid user default\vora-admin " cease to connect to 10.35.255.244:10002...

      I am not getting where this user default\vora-admin is created.

      Thanks

      Shivani

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi,

      can you please use the “--validate” flag with install.sh to validate the installation.

      I have talked to the development team. Hard to know the root cause w/o diving into more details of the installation log.

      We have recently fixed an error in 2.4.1 which sporadically led to this problem. When install.sh with “--validate” flag passes, your cluster is good.

      If not, the best is to open a ticket and let development support check this.

      Best regards
      Thorsten

      Author's profile photo William Thayer
      William Thayer

      Hello!

      Thank you for this blog, it was helpful while setting up DH on GCP.  I have a service account attached to my Kubernetes cluster and nodes.  This service account has GCS Storage Admin capabilities.  But I always see this WARNING in my Trace logs for pipelines.

       

      "3/21/2019, 2:55:10 PM","WARN ","Scope 'devstorage.read_write' missing. Unable to push images to GCR",vflow,container,1,newGoogleContainerRegistry

       

      Any suggestions?

       

      Thank You,

      Will

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hey Will,

      Can you run any graphs from the modeling environment (like the simple data generator sample which is delivered).

      If that works, I currently would not worry about the warning. If that does not work, then something with the permissions is not.

      Can you check what happens when you run the data generator sample?

      Cheers
      Thorsten

      Author's profile photo Roland Kramer
      Roland Kramer

      Hello Thorsten,
      I did as well a DataHub 2.6 Installation from the Maintenance Planer and the usage of the SL Container Bridge recently. Another Blog also describes the Setup of the Jump server for the SLC Bridge

      Best Regards Roland

      Author's profile photo Rahul Pant
      Rahul Pant

      What Authorization method we select if we install 2.6 using SAP CAL. -- Standard or the extended one for Kubernetes ?

       

      Author's profile photo Roland Kramer
      Roland Kramer

      Hello Rahul

      From the Microsoft Azure Help - https://docs.microsoft.com/en-us/azure/aks/tutorial-kubernetes-prepare-acr the Basis Authorisation method should be sufficient ...

      Best Regards Roland

      Author's profile photo R. van Es
      R. van Es

      Hello Thorsten

      Maybe you can help me,

      I was trying to install SAP DATA HUB 2.6 on AWS EKS while I was installing stuck in phase getting below error.  If it was a space issue I have already given 150 gb space to my EC2 instance. If it is really space issue then can you please suggest how can we increase the space to hana-0?

      HANA-0 is shows pending status

       

      Thanks & Regards,

      R.Van ES.

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi,

      I expect this to be insufficient memory (or CPU). You can fine details when you describe the pod via kubectl command.

      Cheers
      Thorsten

      Author's profile photo Rushi Ns
      Rushi Ns

      Please check output of "KUBECTL Get sc" storage class, if storage class is not using "DEFAULT" then HANA Pods will fail due to Persistency volumes was not bound to storage class.

       

       

      Author's profile photo Rushi Ns
      Rushi Ns