Skip to Content
Technical Articles

Installing SAP Data Hub

In my last blog post I wrote about what it means that SAP Data Hub is a containerized application. Today I want to talk about the installation of SAP Data Hub. All explanations relate to SAP Data Hub 2.3 or newer.

This has a prelude: For SAP TechEd I had submitted a MeetUp about installing SAP Data Hub. I wanted to demonstrate the installation during this MeetUp with a live demo… and unfortunately realized a few hours before that the MeetUp room did not come with a monitor ☹. My improvisation consisted of a few hastily prepared whiteboard drawings. I took these drawings as a basis for this blog post.

There is a lot written about installing SAP Data Hub in the official documentation (in particular here and here). My intention behind this blog post is clearly not to enable you to install SAP Data Hub without consulting the documentation. Instead I like to complement the documentation by looking a bit behind the scenes.

Overview of an installation

Every installation of SAP Data Hub – independent of where you install the software – simply spoken consists of three phases: preparation, installation and post-installation.

The installation phase consists of four sub-phases: pre-flight checks, mirroring, deployment and validation. That’s how it looks:

During the installation you get in touch with different “things”. I have tried to depict them in the following diagram:

Preparation

During the preparation phase you need to set up an installation host (1) to run the installation procedure as well as a Kubernetes cluster (2) and a “local” container registry (3) to install SAP Data Hub through the installation procedure.

Remark: If SAP Data Hub is operated as part of an SAP system landscape, the recommended installation procedure to install SAP Data Hub is using SAP Maintenance Planner (see here). An alternative is a command-line tool (install.sh) delivered with SAP Data Hub. For today, I will use the command line tool. I might write a separate blog post about using SAP Maintenance Planner… or maybe a colleague will do. Behind the scenes always install.sh runs. So, what you learn today will stay valid.

The installation host (1) is a Linux computer / virtual machine. It must meet certain requirements. For example, it needs to have Docker, Python, the Kubernetes command-line tool (kubectl) and the Kubernetes package manager (helm) installed.

The installation procedure for SAP Data Hub assumes that you have a running Kubernetes cluster (2) as well as “local” container registry (3). Just like the installation host, both must meet certain requirements. The Kubernetes cluster (2) needs to consist of at least three nodes (all details can be found here).

Depending on whether you like to install SAP Data Hub on-premise or in the cloud (and on which cloud provider) the steps to spin up the cluster and registry defer. I do not like to bloat this blog post by listing the individual commands.

Installation

After you have prepared for the installation, you install SAP Data Hub. Thereto you download the software archive from the SAP Software Download Center (4) to the installation host (1):

After unpacking the software archive (in this example DHFOUNDATION03_3-80004015.zip, i.e. SAP Data Hub 2.3 patch 3), you find the following folder structure on the installation host (1):

Now the fun begins. You start the installation by running the command line tool (install.sh). The command line tool has two mandatory parameters: the Kubernetes namespace used to deploy SAP Data Hub and the “local” container registry (3). You can start the installation like this:

The command line tool has many more additional parameters. It will later prompt for (some of) these, if you do not pass them.

Pre-Flight Checks

After you have started the installation, the command line tool performs a couple of checks to ensure that the necessary prerequisites to install SAP Data Hub are fulfilled. These checks are supposed to ensure that the installation does not break halfway. They are comparable to the checks a pilot performs prior to takeoff to minimize the risk of a plane crash. Hence SAP’s developers called them “pre-flight checks”:

Subsequently to the pre-flight checks, the command line tool prompts for additional parameters (which you did not pass when calling install.sh). Finally, it asks you to confirm the parameters (aka “configuration”) for the installation:

Mirroring

After you have confirmed the parameters, the command line tool first mirrors the container images for SAP Data Hub. They will later be used to run the different components of SAP Data Hub on the Kubernetes cluster (2).

Mirroring means the command line tool pulls the necessary container images from the (private) SAP container registry (5) as well as from (public) third party container registries (6) to the installation host (1). Afterwards it tags the container images for and pushes them to the “local” container registry (3). “Local” means the container registry which is used by the Kubernetes cluster (2).

The following screenshot shows some of the container images on the installation host (1):

You can see that each container image is listed two times:

  • Once the container image is tagged with the container registry it was pulled from, e.g. the SAP container registry (73554900100200008830.dockersrv.repositories.sap.ondemand.com) or Docker Hub (docker.io).
  • Once it is tagged with the container registry it was pushed to, i.e. the “local” container registry (3) used by the Kubernetes cluster (2). In this example this is the container registry eu.gcr.io/…234664.

The SAP container registry (5) includes the container images for all versions (support packages, patches) of SAP Data Hub. The command line tool is “bound” to one version of SAP Data Hub (in this example SAP Data Hub 2.3 patch 3). All relevant container images are listed in the ./tools/images.sh file inside the software archive downloaded from the SAP Software Download Center (4).

Deployment

After all necessary container images have been mirrored, the command line tool deploys the different components of SAP Data Hub. For this it uses the Kubernetes package manager (helm). At the end of the deployment, all containers needed by SAP Data Hub will run on the Kubernetes cluster (2). The cluster will look similar to this now (the screenshot shows all running containers):

Necessary files for helm are stored in the ./deployment directory inside the software archive downloaded from the SAP Download Center (4).

Validation

Finally, install.sh runs a couple of validations to ensure that SAP Data Hub is functional. The following screenshot shows the output in case all validations are successful:

The validations include:

  • Creation of tables in SAP Vora, execution of several queries (vora-cluster)
  • Execution of smoke tests for Spark (vora-sparkkonk8s)
    Remark: Certain features of SAP Data Hub make use of Spark und run Spark workloads on the Kubernetes cluster.
  • Connection to SAP Data Hub System Management and verification of installed applications, e.g. Connection Management, Metadata Explorer, Vora Tools (vora-vsystem)
  • Verification of the SAP HANA database used by applications like the aforementioned ones (datahub-app-base-db)

You can find the detailed results of the validations in the ./logs folder:

Post-Installation

After you have successfully installed the software, additional post-installation steps can be necessary. Again (just like for the preparation), the steps depend on whether you like to install SAP Data Hub on-premise or in the cloud (and on which cloud provider). And again, I do not like to bloat this blog post by listing the individual commands. If you like to know the details, then you can take a look at the official documentation.

Hooray. SAP Data Hub is running. you can log on with the user / password passed to the command line tool earlier:

That’s all for now. I hope you found this blog post interesting. Next time I will most likely write something about data pipelines and workflows…

28 Comments
You must be Logged on to comment or reply to a post.
  • Hello Thorsten

    Maybe you can help me.

    I want to install Data Hub on a Kubernetes Cluster on a Centos 7 VM.

    But my Hana-0 and vora-*** pods wont start.

    The only error I get from the installation script is: waiting for these pods to become ready.

    And after that the installation is canceld.

    I hope you can help me.

  • Hi Thorsten,

     

    Thank you for the helpful blog.

    I am trying to install SAP Data Hub and the installer asked me for login credentials to access SAP Docker Artifactory.

    I have tried with my SAP S-Users (including with my SAP Logon to user DMZ stores) unsuccessfully – I got error: [ERROR] Could not login with the provided credentials!

    How I can request credentials for SAP Docker Artifactory access (Technical or S-User)?

     

    Cheers,

    Anton

  • Hello Thorsten,

    I am installing SAP Data Hub 2.3. While installing , i am stuck at the mirroring process. Mirroring of one of the images throws me below error :

    Can you please let me know where at the file system level does this downloading and extraction of images take place? As it says no space left on device, we need to find out the directory whether on installation host or the kubernetes cluster .

    Please advice.

    Thanks,
    Shivani

  • Hello Thorsten,

    Below is the logs for the dqp-validator-job

    Running query: SELECT USER_NAME FROM SYS.USERS WHERE USER_NAME=’default\vora-admin’; connecting to 10.35.255.244 at port 10002 (10.35.255.244:10002) … switched to existing session “” query send… error on server response: “could not handle api call, failure reason : :-1, CException, Code: 10021 : Runtime category : an std::exception wrapped. invalid user default\vora-adminNext level: invalid user default\vora-admin ” cease to connect to 10.35.255.244:10002… Validating schemas have been created in Vora … Running query: SELECT * FROM SYS.SCHEMAS connecting to 10.35.255.244 at port 10002 (10.35.255.244:10002) … switched to existing session “” query send… error on server response: “could not handle api call, failure reason : :-1, CException, Code: 10021 : Runtime category : an std::exception wrapped. invalid user default\vora-adminNext level: invalid user default\vora-admin ” cease to connect to 10.35.255.244:10002…

    I am not getting where this user default\vora-admin is created.

    Thanks

    Shivani

    • Hi,

      can you please use the “–validate” flag with install.sh to validate the installation.

      I have talked to the development team. Hard to know the root cause w/o diving into more details of the installation log.

      We have recently fixed an error in 2.4.1 which sporadically led to this problem. When install.sh with “–validate” flag passes, your cluster is good.

      If not, the best is to open a ticket and let development support check this.

      Best regards
      Thorsten

  • Hello!

    Thank you for this blog, it was helpful while setting up DH on GCP.  I have a service account attached to my Kubernetes cluster and nodes.  This service account has GCS Storage Admin capabilities.  But I always see this WARNING in my Trace logs for pipelines.

     

    “3/21/2019, 2:55:10 PM”,”WARN “,”Scope ‘devstorage.read_write’ missing. Unable to push images to GCR”,vflow,container,1,newGoogleContainerRegistry

     

    Any suggestions?

     

    Thank You,

    Will

    • Hey Will,

      Can you run any graphs from the modeling environment (like the simple data generator sample which is delivered).

      If that works, I currently would not worry about the warning. If that does not work, then something with the permissions is not.

      Can you check what happens when you run the data generator sample?

      Cheers
      Thorsten