Skip to Content
Event Information

The new CodeJam topic: data engineering with SAP Data Hub

On May 17 – the day before the 5th SAP Inside Track in Wrocław – Capgemini hosted two SAP CodeJams for developers:

  1. For software engineers: Cloud Application Programming Model with node.js – by my great colleagues DJ Adams and Maximilian Streifeneder, and
  2. For data engineers: data pipelines modelling with SAP Data Hub – by truly yours.

As you can imaging I am super excited about the second one. It was the very first SAP CodeJam on this topic, based on the content I prepared. And it was truly a world premier, as I had people coming not only from Wrocław and Warsaw, but as well from Madrid and Istanbul!

And with that we now accept requests for this topic,
if you want to host next such a CodeJam!

So what is included?

Development in the focus

SAP Data Hub is the all-in-one data orchestration solution that discovers, refines, enriches, and governs any type, variety, and volume of data across your entire distributed data landscape.

During the CodeJam we are focusing specifically on where you – as a developer (e.g. data engineer) – bring the value to the table by using Modeller and Vora Tools to build data pipelines, including programming custom operators.

Bring your own laptop

it is the bring-your-own-laptop event. You are going to setup SAP Data Hub, developer edition, on it. And you can continue working with it after an event too.

The current developer edition (version 2.4 of Data Hub, as of today) is a Docker container. We do a quick overview and exercises to understand all aspects of Docker useful to use the product.

Pipeline #1: Online chat with the sentiment analysis

In the first pipeline, which you’re going to build, we focus on the basics of building graphs (data pipelines), on using Terminal operator and on coding Processing Operators (specifically using Python and JavaScript).

Pipeline #2: IoT data processing

Now, once we get the basics thanks to the pipeline #1, it’s time to process IoT data. You will build your own device simulator, a real-time dashboard for data received via MQTT consumer from all simulators in the room, and charts for analysis of the same data persisted in HDFS.

During this hands-on exercise we focus on going deeper into the Python operator and its APIs, on HTML5 operator for real-time results, and on basics of Vora for modeling, querying and analysis data.

The next step is building your own reusable operators (and Docker files, if needed) satisfying required dependencies.

Bonus: try the Trial edition

Should there be any time left, you will try as well a Trial edition of SAP Data Hub hosted by SAP. It is time for me to explain as well the differences between different editions and how to get your own trial system.

Bonus for SAP: your feedback

This very first SAP Data Hub’s CodeJam last week was a learning experience not just for participants, but as well for me. It gave us a first hand feedback about the product, its developer edition and tutorials that we have online. This is very valuable information to pass back to our product engineering team!


So, is this something you would try as well? Then get together data engineers and other developers interested in these topics, and request your local SAP CodeJam! I’ll be happy to come and help you to build all this 🙂

 

-Vitaliy, aka @Sygyzmundovych

Be the first to leave a comment
You must be Logged on to comment or reply to a post.