Skip to Content

At SAP TechEd in September we announced the release of SAP Data Hub. And today we deliver the SAP Data Hub, developer edition.

SAP Data Hub is a data sharing, pipelining, and orchestration solution that helps companies accelerate and expand the flow of data across their modern, diverse data landscapes (for more details take a look at Marc’s excellent FAQ blog post). Simply spoken, it includes features for:

  • Data governance (metadata management, discovery, profiling…)
  • Data pipelines (flow-based applications)
  • Workflows (orchestration of processes across the data landscape)

The architecture of SAP Data Hub leverages modern container technology and, again simply spoken, looks like this:

The main (technical) components of SAP Data Hub are:

  • Application based on SAP HANA, XS Advanced Model
  • Distributed Runtime leveraging Kubernetes
    • SAP Vora (to run distributed queries on “Big Data”)
    • SAP Data Hub Pipelines (to run flow-based applications)
  • SAP Data Hub Adapter (central communication endpoint for operations performed from the application on Kubernetes and Hadoop)
  • SAP Vora Spark Extensions (extensions for the Spark execution framework to access data in SAP Vora and SAP HANA)

SAP Data Hub, developer edition

For the developer edition, we have been looking for a way to run SAP Data Hub on your local computer. While there are possibilities involving SAP HANA, express edition and Kubernetes (Minikube) on your local computer, we have decided for a different approach.

We took the parts of SAP Data Hub, which are in our opinion of most interest for developers (SAP Vora, SAP Data Hub Pipelines) and packaged them together with Hadoop and Zeppelin into a single Docker image / container. This approach is similar to what we have done for SAP Vora in the past.

Now, what are the advantages of this approach?

  • You can easily run SAP Data Hub, developer edition on your local computer (be it Windows, Linux or MacOS).
  • Building the Docker image locally takes between 30 and 60 minutes. During this time, you need a stable internet connection. Once the Docker image is built, you can start a container based on the image within a few minutes and without network connectivity.
  • You can build powerful data pipelines (and they can interact with all kind of other technologies, e.g. SAP HANA, SAP API Business Hub, Kafka, any web service).

Of course, there are also some drawbacks:

  • The SAP Data Hub, developer edition currently does not allow you to use data governance and workflow features of SAP Data Hub.
  • Unfortunately, you cannot observe how the SAP Data Hub usually containerizes and deploys data-driven applications onto Kubernetes.
  • Some of the data pipeline operators (i.e., the re-useable and configurable components which you can combine to build data pipelines) will not work inside the container. Most notably, the operators related to machine learning (leveraging TensorFlow) and image processing (leveraging OpenCV) currently cannot be used, at least not “out-of-the-box”.

How to get started?

To give the SAP Data Hub, developer edition a try, visit our Tutorial Navigator. Currently the following tutorials are available:

The tutorials give you a first idea how to build data-driven applications with SAP Data Hub. You will learn how to create your first pipeline. You will use a message broker, HDFS as well as SAP Vora.

Next steps

We know, this is just a starting point for enablling developers to work with SAP Data Hub. There are several things we have planned for the future:

  • We continue to work to remove some of the aforementioned drawbacks (in particular, the operators the developer edition currently cannot run).
  • We plan to publish additional tutorials (e.g. for machine learning, how to create your own operators).
  • We have started working on an openSAP course for SAP Data Hub.
  • We have begun to work on a cloud-based trial environment for SAP Data Hub. This will run on Kubernetes and offer additional possibilities, including access to more of the SAP Data Hub functionality.

Stay tuned! If you have questions, problems or proposals in the meantime, feel free to post them as comments to this blog, or to the SAP Community. We will try to answer them in a timely manner and collect frequently asked questions here.

 

To report this post you need to login first.

18 Comments

You must be Logged on to comment or reply to a post.

  1. Frank Schuler

    Hello Thorsten,

    The SAP Data Hub, developer edition is brilliant. I got my first Data Pipeline running in about an hour.

    However, when adding port forwarding for port 5050 and trying to connect the SAP Date Hub, developer edition to my SAP Data Hub Cockpit it validates fine, but then throws internal error: Cannot connect to agent.

    Is this because the SAP Data Hub Adapter is not installed on the SAP Data Hub, developer edition, or does it listen to a different port? If it had not been included into the current SAP Data Hub, developer edition, could it be added?

    Best regards and many thanks in advance

    Frank

    (0) 
    1. Thorsten Schneider Post author

      Hi Frank,

      the adapter is currently not installed. We decided against installing it, since it is only useful when you have the XSA-part of SAP Data Hub running (which is not availble as part of the developer edition at the moment). But let me give it a try to install and connect to it. If there are no (insolvable) problems, I will try to get it added.

      Cheers

      Thorsten

      PS: congratulations to your blog post series around SAP Data Hub. Very nice read!

      (1) 
      1. Frank Schuler

        Many thanks in advance, Thorsten.

        By the way, currently, I am stuck with connecting my SAP Data Hub Cockpit to my VORA Data Pipeline with the following error:

        I already discussed this with Axel Schuller, and he seems to remember a similar problem when he verified the SAP Data Hub installation, but suggests that this would need SAP development to look into.

        If this was in fact a known issue, might someone give me a hint how to overcome it? The trace does not show much more detail either.

        Very best regards

        Frank

        (0) 
        1. Thorsten Schneider Post author

           

          Hi Frank,

          does this happen when you connect to the VORA Pipeline inside or to one outside the container?
          I assume the later.

          Indeed, this message seems also familiar to me. But I don’t recall exactly what was the problem. Can you mail me a screenshot of the connection (firstname dot last name at sap dot com).

          Thanks

          Thorsten

          (0) 
  2. Douglas Maltby

    Many thanks, Thorsten! Excellent blog and set of tutorials. The Data Hub Dev Edition provides a great environment for experimentation with SDH and Vora 2.0. Thank you!

    The only issues I ran into were seemingly related to using Docker Toolbox, rather than the more current Docker. I have a 2008 Mac Pro with 64GB RAM, but the pre-2010 Xeon CPUs don’t have the VT-X instruction set, so I must use the older Docker Toolbox. Even with Docker Toolbox, I was able to get Data Hub running, using the Virtualbox VMs IP address rather than localhost, but without Zeppelin (it wouldn’t start and get to the status loop, so just removed –zeppelin when starting). I also couldn’t start the spotify/kafka container (error waiting for the container, timeout) to go through the later exercises, but Data Hub and Vora both worked fine in Docker Toolbox. I just wanted to post for others that may have the same issues using Docker Toolbox.

    I’m impatiently awaiting a new Mac Pro in 2018, so I “borrowed” my wife’s slightly more current 2011 MacBook Pro w/16GB RAM, installed Docker for Mac and everything in your SDH Dev Edition  tutorials worked flawlessly.

    I’m looking forward to future dev editions (incl Vora 2.1) covering the data governance and workflow use cases, and plan to connect it with HXE using Frank’s blogs. Thanks to you both for all your insight!

    Doug

     

    (0) 
    1. Thorsten Schneider Post author

      Hi Doug,

      thanks for the feedback and happy new year. Hard to say why you had trouble with Zeppelin / Kafka. We tested both successfully with Docker Toolbox (on Windows though). The only immediate thought I have: did you give enough resources to the VM running Docker (see our FAQs)? We observed that the initial sizing of the VM caused us trouble. I believe on the 16GB Windows system we used for testing, it was set to 1GB only.

      Cheers

      Thorsten

      (0) 
  3. Thorsten Schneider Post author

    The SAP Data Hub, developer edition 1.2 is available. Same procedure as before… to get it follow the tutorials.

    There are not too many changes. SAP Vora tables do not have to be recreated after restarting the Docker container now.

    (0) 
    1. Former Member

      Hello Thorsten,

       

      is it possible to update the old version of the developer edition to the new one?

      Or do I have to install the new version again from scratch?

       

      Best Regards,

      Fabian

      (0) 
  4. Tony Maas

    Hi Thorsten,

    I have Data Hub Dev Edition 1.4 running on Ubuntu Linux, with Docker 18.06.0-ce in a VMware environment. I can access the Data Hub Pipeline Modeler and Vora Tools UI’s just fine within the Ubuntu system, but port forwarding outside the VM does not appear to be happening.

    I can ping the IP of the VM from another system on the same network, the VM is set to bridge to the network.  But I cannot telnet to any of the ports exposed through the docker run command’s –publish parameters.

    Is there another step I’m missing?

    Also, is there a timeline to adding some of the other features discussed, like the workflow capabilities and other operators currently not supported? Would really like to be able to tie this in to Data Services, for example, and be able to demonstrate use for ML and Data Services together.

    Thanks,

    Tony

    (0) 
    1. Tony Maas

      Hi Thorsten,

      I played with the ‘docker run’ command and removed the 127.0.0.1 prefix from the publish statements, and now I’m able to access the ports outside the Ubuntu VM.  Further testing to go, but that’s a good start!

      Would still love to hear more info on future plans for the Data Hub Dev Edition if you can share them.

      Tony

       

      (0) 
    2. Thorsten Schneider Post author

      Hi Tony,

      we are currently discussing which features to support for the next version of the developer edition, but we have not yet reached a final conclusion.

      One thing you can already do today is using the trial edition – https://blogs.sap.com/2018/04/26/sap-data-hub-trial-edition/. You need an account on Google Cloud Platform to give it a try. It includes all features of SAP Data Hub.

      Best regards
      Thorsten

      (0) 
  5. Thorsten Lüdtke

    Hi Thorsten,

    do you have an updated readme file available? I followed the ‘Adding Apache Zeppelin‘ instructions but received the message ‘livy interpreter does not exist at zeppelin server. terminating‘ after running

    docker run --net dev-net datahub zeppelin [ZEPPELIN_URL]

    with the Zeppelin URL pointing to the Apache livy server on port 8998. What am I missing?

    Regards,

    Thorsten

    (0) 

Leave a Reply