Skip to Content
Product Information
Author's profile photo Thorsten Schneider

SAP Data Hub, developer edition 2.4

End of 2017, we have delivered SAP Data Hub, developer edition. This week we updated it to our newest release. And here is SAP Data Hub, developer edition 2.4.

SAP Data Hub is a data sharing, pipelining, and orchestration solution that helps companies accelerate and expand the flow of data across their modern, diverse data landscapes (for more details take a look at Marc’s excellent FAQ blog post).

The architecture of SAP Data Hub leverages modern container technology and, simply spoken, looks like this:

The main (technical) components of SAP Data Hub are:

  • SAP Data Hub Foundation (mandatory component, installed on Kubernetes)
  • SAP Data Hub Spark Extensions (optional component, installed on Hadoop)

SAP Data Hub, developer edition

For the developer edition, we have been looking for a way to run SAP Data Hub on your local computer. We took the parts of SAP Data Hub, which are in our opinion of most interest for developers and packaged them together with HDFS, Spark and Livy into a single Docker container image. This container image can be used with different start options. Depending on the start option, it either runs SAP Vora Database, SAP Vora Tools, SAP Data Hub Modeler or HDFS, Spark, Livy (which are required for some example pipelines and tutorials).

 

Now, what are the advantages of this approach?

  • You can easily run SAP Data Hub, developer edition on your local computer (be it Windows, Linux or MacOS).
  • Building the container image locally typically takes a few minutes. During this time, you need a stable internet connection. Once the container image is built, you can start a container based on the container image within less than a minute and without network connectivity.
  • You can build powerful data pipelines (and they can interact with all kind of other technologies, e.g. SAP HANA, SAP API Business Hub, Kafka, any web service).

Of course, there are also some drawbacks:

  • The SAP Data Hub, developer edition currently does not allow you to use data governance and workflow features of SAP Data Hub.
  • Unfortunately, you cannot observe how the SAP Data Hub usually containerizes and deploys data-driven applications onto Kubernetes.
  • Some of the data pipeline operators (i.e., the re-useable and configurable components which you can combine to build data pipelines) will not work inside the container. Most notably, the operators related to machine learning (leveraging TensorFlow) and image processing (leveraging OpenCV) currently cannot be used, at least not “out-of-the-box”.

How to get started?

To give the SAP Data Hub, developer edition a try, visit our Tutorial Navigator. Currently the following tutorials are available:

The tutorials give you a first idea how to build data-driven applications with SAP Data Hub. You will learn how to create your first pipeline. You will use a message broker, HDFS as well as SAP Vora.

If you have questions, problems or proposals in the meantime, feel free to post them as comments to this blog, or to the SAP Community. We will try to answer them in a timely manner and collect frequently asked questions here.

Assigned Tags

      38 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      I added a direct link to the download of SAP Data Hub, developer edition in SAP Store. Earlier we only had the link inside the tutorials.

      Author's profile photo Javier Andrés Cáceres Moreno
      Javier Andrés Cáceres Moreno

      Good news

      Author's profile photo Frank Schuler
      Frank Schuler

      Hello Thorsten,

      The SAP Data Hub, developer edition is brilliant. I got my first Data Pipeline running in about an hour.

      However, when adding port forwarding for port 5050 and trying to connect the SAP Date Hub, developer edition to my SAP Data Hub Cockpit it validates fine, but then throws internal error: Cannot connect to agent.

      Is this because the SAP Data Hub Adapter is not installed on the SAP Data Hub, developer edition, or does it listen to a different port? If it had not been included into the current SAP Data Hub, developer edition, could it be added?

      Best regards and many thanks in advance

      Frank

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Frank,

      the adapter is currently not installed. We decided against installing it, since it is only useful when you have the XSA-part of SAP Data Hub running (which is not availble as part of the developer edition at the moment). But let me give it a try to install and connect to it. If there are no (insolvable) problems, I will try to get it added.

      Cheers

      Thorsten

      PS: congratulations to your blog post series around SAP Data Hub. Very nice read!

      Author's profile photo Frank Schuler
      Frank Schuler

      Many thanks in advance, Thorsten.

      By the way, currently, I am stuck with connecting my SAP Data Hub Cockpit to my VORA Data Pipeline with the following error:

      I already discussed this with Axel Schuller, and he seems to remember a similar problem when he verified the SAP Data Hub installation, but suggests that this would need SAP development to look into.

      If this was in fact a known issue, might someone give me a hint how to overcome it? The trace does not show much more detail either.

      Very best regards

      Frank

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

       

      Hi Frank,

      does this happen when you connect to the VORA Pipeline inside or to one outside the container?
      I assume the later.

      Indeed, this message seems also familiar to me. But I don’t recall exactly what was the problem. Can you mail me a screenshot of the connection (firstname dot last name at sap dot com).

      Thanks

      Thorsten

      Author's profile photo Frank Schuler
      Frank Schuler

      Hello Thorsten,

      Is the SAP VORA vFlow API available on the  SAP Data Hub, developer edition and if so, listening to which port?

      Best regards

      Frank

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Frank,

      that should be the 8090 port. Best regards

      Thorsten

      Author's profile photo Douglas Maltby
      Douglas Maltby

      Many thanks, Thorsten! Excellent blog and set of tutorials. The Data Hub Dev Edition provides a great environment for experimentation with SDH and Vora 2.0. Thank you!

      The only issues I ran into were seemingly related to using Docker Toolbox, rather than the more current Docker. I have a 2008 Mac Pro with 64GB RAM, but the pre-2010 Xeon CPUs don't have the VT-X instruction set, so I must use the older Docker Toolbox. Even with Docker Toolbox, I was able to get Data Hub running, using the Virtualbox VMs IP address rather than localhost, but without Zeppelin (it wouldn't start and get to the status loop, so just removed --zeppelin when starting). I also couldn't start the spotify/kafka container (error waiting for the container, timeout) to go through the later exercises, but Data Hub and Vora both worked fine in Docker Toolbox. I just wanted to post for others that may have the same issues using Docker Toolbox.

      I'm impatiently awaiting a new Mac Pro in 2018, so I "borrowed" my wife's slightly more current 2011 MacBook Pro w/16GB RAM, installed Docker for Mac and everything in your SDH Dev Edition  tutorials worked flawlessly.

      I'm looking forward to future dev editions (incl Vora 2.1) covering the data governance and workflow use cases, and plan to connect it with HXE using Frank's blogs. Thanks to you both for all your insight!

      Doug

       

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Doug,

      thanks for the feedback and happy new year. Hard to say why you had trouble with Zeppelin / Kafka. We tested both successfully with Docker Toolbox (on Windows though). The only immediate thought I have: did you give enough resources to the VM running Docker (see our FAQs)? We observed that the initial sizing of the VM caused us trouble. I believe on the 16GB Windows system we used for testing, it was set to 1GB only.

      Cheers

      Thorsten

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      The SAP Data Hub, developer edition 1.2 is available. Same procedure as before... to get it follow the tutorials.

      There are not too many changes. SAP Vora tables do not have to be recreated after restarting the Docker container now.

      Author's profile photo Former Member
      Former Member

      Hello Thorsten,

       

      is it possible to update the old version of the developer edition to the new one?

      Or do I have to install the new version again from scratch?

       

      Best Regards,

      Fabian

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Fabian,

      you have to do a new installation, i.e. build the new container image.

      Cheers

      Thorsten

      Author's profile photo Former Member
      Former Member

      Great stuff! Which group within SAP would be helping with implementing Data Hub solutions for clients.

       

      Author's profile photo Tony Maas
      Tony Maas

      Hi Thorsten,

      I have Data Hub Dev Edition 1.4 running on Ubuntu Linux, with Docker 18.06.0-ce in a VMware environment. I can access the Data Hub Pipeline Modeler and Vora Tools UI's just fine within the Ubuntu system, but port forwarding outside the VM does not appear to be happening.

      I can ping the IP of the VM from another system on the same network, the VM is set to bridge to the network.  But I cannot telnet to any of the ports exposed through the docker run command's --publish parameters.

      Is there another step I'm missing?

      Also, is there a timeline to adding some of the other features discussed, like the workflow capabilities and other operators currently not supported? Would really like to be able to tie this in to Data Services, for example, and be able to demonstrate use for ML and Data Services together.

      Thanks,

      Tony

      Author's profile photo Tony Maas
      Tony Maas

      Hi Thorsten,

      I played with the ‘docker run’ command and removed the 127.0.0.1 prefix from the publish statements, and now I’m able to access the ports outside the Ubuntu VM.  Further testing to go, but that’s a good start!

      Would still love to hear more info on future plans for the Data Hub Dev Edition if you can share them.

      Tony

       

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Tony,

      we are currently discussing which features to support for the next version of the developer edition, but we have not yet reached a final conclusion.

      One thing you can already do today is using the trial edition - https://blogs.sap.com/2018/04/26/sap-data-hub-trial-edition/. You need an account on Google Cloud Platform to give it a try. It includes all features of SAP Data Hub.

      Best regards
      Thorsten

      Author's profile photo Thorsten Lüdtke
      Thorsten Lüdtke

      Hi Thorsten,

      do you have an updated readme file available? I followed the 'Adding Apache Zeppelin' instructions but received the message 'livy interpreter does not exist at zeppelin server. terminating' after running

      docker run --net dev-net datahub zeppelin [ZEPPELIN_URL]

      with the Zeppelin URL pointing to the Apache livy server on port 8998. What am I missing?

      Regards,

      Thorsten

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Updated blog post for SAP Data Hub, developer edition 2.3.

      Author's profile photo Mitch-Benjamin Ditmar
      Mitch-Benjamin Ditmar

      Hi,

      I tested the datahub and everything worked out fine (examples etc.).

      At the moment we are gathering use cases. Is the datahub able to draw data directly from a SAP R3 System, transform it and store it in the R3 again?

      Thanks in advance!

       

      Regards

       

      Mitch

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Mitch,

      to be honest, I don't quite understand what you like to do in detail. You are rather flexible in the modeling environment. But what you describe does not sound like a good match for SAP Data Hub at the moment. I also find “transforming” data in a transactional system a bit dangerous.

      I do understand that one might want to extract data from R3, then analyze / use it and “in return” write some “other” data back. But “transforming” the “original” data… I am not so sure about the exact requirement you like to solve.

      Cheers
      Thorsten

      Author's profile photo Anilkumar Vippagunta
      Anilkumar Vippagunta

       

      Hello Thorsten,

       

      Thanks for the Blog!

      I'm facing while starting separate HDFS container. Below is the log

       

      2019-01-31T06:13:22+00:00 TLS is set to false
      2019-01-31T06:13:22+00:00 -------- executing status_loop --------
      LIVY is down. restart triggered
      2019-01-31T06:13:29+00:00 -------- executing LIVY_start --------
      livy-server running as process 944.  Stop it first.
      2019-01-31T06:13:40+00:00 TLS is set to false
      2019-01-31T06:13:40+00:00 HDFS Namenode:                 tcp://172.21.0.3:9000 http://172.21.0.3:50070
      2019-01-31T06:13:40+00:00 HDFS Datanode:                 tcp://172.21.0.3:50010
      2019-01-31T06:13:40+00:00 HTTPFS:                        webhdfs://172.21.0.3:14000
      2019-01-31T06:13:40+00:00 Apache livy:                   http://172.21.0.3:8998

       

      and the Datahub tools and VORA URLs are not accessible from browser.

       

      Regards,

      Anil

       

      Author's profile photo Sylvain Garneau
      Sylvain Garneau

      Hello  Anilkumar Vippagunta

       

      i'm facing the same problem when tried to start the HDFS. it's look like it's a permission issue!?

       

      Let me know if you found the solution please!!!

       

      Good luck!!!!

      Author's profile photo Sylvain Garneau
      Sylvain Garneau

      I found!!!

       

      I just replace the port 50070 by 1050!

       

       

      Author's profile photo Anilkumar Vippagunta
      Anilkumar Vippagunta

      Hi,

       

      No luck .

       

      Regards,

      Anil

      Author's profile photo Witalij Rudnicki
      Witalij Rudnicki

      Hi Anikumar. Do you face the same with the new 2.4 developer edition?

      And do you face "the Datahub tools and VORA URLs are not accessible from browser." when openning them with 'localhost' or '172.21.0.2' in the URL?

      Author's profile photo Harro Dittmar
      Harro Dittmar

      Hello Thorsten,

      Thank you for the informative article. I am currently trying to implement state of the art data architecture into a data processing tool. I am running the DataHub developer edition 2.4 on Windows 10 in a Docker (2.0.0.3) container with a Firefox frontend. Unfortunately, I fail to get the R Client object working. It does not recognize the R and Rscript executables that I compiled with apt-get in a Docker File that I build using the DataHub frontend. The error message reads

      .R Client: Rserve process died before expected: failed to automatically select port and start Rserve: rserve process returned an error: exec: “Rscript”: executable file not found in $PATH:

      Somehow, DataHub seems to look for the Compiler exe in the wrong spot. Can you give me a hint how to modify $PATH when I build the docker-container?

       

      Author's profile photo Jeremy Ma
      Jeremy Ma

      Thanks Thorsten for the excellent write up!

       

      I too want to develop a custom Operator using Python but run into an error when I define a new docker file as describe below in preparation of an env/lib.. Is the following supported in the dev edition?  The below script works fine in full DH install, but running dev Ed save me AWS$  🙂

      -----------

      FROM debian:9.2

      RUN apt-get update &&\

      apt-get install -y python &&\

      apt-get install -y python-pip &&\

      apt-get install -y python-pandas &&\

      pip install pyfpgrowth


       

      Error message:

      error building docker image. Docker daemon error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Jeremy,

      that will not work. The developer edition runs all operators inside the existing container of the developer edition. It does not spin up any (additional) container at all when running a pipeline.

      Two options

      1. use the trial edition
      2. try to install the additional software (pandas in your case) into the container of the developer edition (e.g .by executing bash and then running apt-get). I can't guarantee that this will work (so 1. is the safer choice), but I did similar things before...

      Cheers

      Thorsten

      Author's profile photo Henry Jones
      Henry Jones

      Hi.

      Can I use the SAP Data Hub developer edition (Vora, in particular) against our non-production systems( off-premise HANA (CAL) and on-premise Hadoop, etc) as a proof of concept?

      Thanks.

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Henry,

      in general: yes. The functions of the developer edition are limited though compared to a "real" system and the trial edition.

      For POCs I rather recommend using the trial edition. Since you anway seem to use CAL for HANA already,  I propose you do the same for Data Hub.

      Cheers

      Thorsten

      Author's profile photo Jude Regy
      Jude Regy

      Hello There,

      I appreciate all your efforts in creating this tutorial.

      I am running data hub successfully in my windows PC. I am able to access the modeler, hdfs and kafka as well from the same container.

      How would I access the other screens like connection management, metadata explorer and other important screens in the developer edition?

      Thanks,

      Jude

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Jude,

      they are not available in the developer edition. It only has the modeler and the Vora tools included.

      If you need more, you can consider the trial edition (running on either Google Cloud Platform or Amazon Web Services).

      Cheers
      Thorsten

      Author's profile photo Jacob ZITTOUN
      Jacob ZITTOUN

      Hi,

      I’m trying to finish the SAP DH 2.4 setup tutorial.

      I’m stuck at step 4 when running the auxiliary HDFS  container on my Windows PowerShell.

      Any ideas why it cannot install Hadoop except that problem occurs in function HADOOP_install of dev-edition-helper.sh file (See below) ?

       

      function HADOOP_install(){
      HADOOP_set_env
      if [ ! -d ${HADOOP_PREFIX}/bin ]; then

      HADOOP_download
      log “install hadoop”
      mkdir -p ${HADOOP_PREFIX}
      tar -xzf ${HADOOP_DOWNLOAD_PATH} -C ${HADOOP_PREFIX} –strip=1
      if [[ $? != 0 ]]; then
      rmdir ${HADOOP_PREFIX}
      die “Couldn’t extract Hadoop”

      Thanks

       

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Jacob,

      have you retried / can you retry. We tested this today on Windows as well as on Power Shell and it worked without problems.

      Cheers

      Thorsten

      Author's profile photo Jacob ZITTOUN
      Jacob ZITTOUN

      Hi Thorsten,

      Thanks for your prompt answer !!!

      I've just tried from scratch by first removing the images (using commands docker ps -a -q | % { docker rm $_ } and docker images -q | % { docker rmi $_ } ) and the network (using command docker network rm dev-net) and step 4 has passed successfully without knowing what was the root cause... 🙂

      In fact, I've followed the steps from the readme file (located in folder \DatahubDevEdition\readme) and steps are slightly different: for example, the image name is not the same (e.g sapdatahub/dev-edition:2.4) but I'm doubtful it is the root cause.

      Anyway, thanks for testing with same environment, I appreciate.

      Jacob

      Author's profile photo Thorsten Schneider
      Thorsten Schneider
      Blog Post Author

      Hi Jacob,

      I am (also) doubtful about the root cause being the difference compared to the readme file (I think, we simply forgot to update it... we can fix this with the next update of the developer edition - which currently is not scheduled yet). To me the error message looks more like something which can happen if the container cannot download certain things from internet locations.

      Good to hear that it works now. Enjoy the developer edition.

      Cheers
      Thorsten

      Author's profile photo Prothoma Sinha
      Prothoma Sinha

      Hi Thorsten,

      I am new to Data Hub. I have followed the tutorials mentioned in this blog and I am able to set up the developer edition (2.4) using the docker image in my VM. I do not find the SAP Data Hub System management functionality. Can we import any existing solutions (containing the graphs, operators) without this functionality?

      I am able to import graphs. But for importing all the dependent artifacts(say any custom operators) I do not find any functionality.

      Kind regards,

      Prothoma Sinha