Skip to Content
Technical Articles
Author's profile photo Cameron Swift

SAP Data Intelligence Cloud – Pipeline Debugging

This blog post is written in collaboration with the SAP HANA Database & Analytics, Cross Product Management team

SAP Data Intelligence Cloud’s latest release (DI:2107) provides a number of enhancements. One of the new features relevant for work within the Modeler is the addition of Pipeline (Graph) Debugging using Breakpoints

How can we make use of Breakpoints to better understand the data flow when working with pipelines? Let’s walk through an example

 

Our Scenario

For this example, we’re going to use a simplified pipeline based on the Tankerkönig Data & Analytics Showcase. Tankerkönig publishes gasoline prices from across Germany, as well as master data for German gas stations

We have the Stations Master Data CSV file in our DI Data Lake. The pipeline reads this data using the Read File operator, then uses the HANA Client operator to store the master data records in a HANA table.

Our%20basic%20pipeline

Our basic pipeline

 

In our scenario, we’ve been handed the finished pipeline and will be responsible for maintaining it in the future. Given that we’ll be responsible for maintenance, we want to better understand how data flows through the pipeline

 

Understanding the Data Flow

If we want to understand what data is being fed into our HANA Client operator, we may be tempted to add a Wiretap or Terminal operator in between our existing operators to view the data. While this will provide us with the data, it also requires us to re-work the existing connections and add the new operators while we’re learning. Then once we understand the flow we have to remove them and reset the connections

Our%20pipeline%20with%20Wiretap%20Operators

Our pipeline with Wiretap Operators

 

While this may be doable with a simple pipeline, it becomes more cumbersome as the complexity increases. Additionally, if the pipeline is currently working as intended, we want to avoid making unnecessary changes

 

Setting Breakpoints

With the latest release (DI:2107), we can now set breakpoints to view the data flow without editing the pipeline itself. To set the breakpoint, we can either hover over the start of a connection and click on the circle that appears, or we can right click the connection and select Add Breakpoint

There%20are%20two%20ways%20to%20set%20a%20breakpoint

Two ways to set a breakpoint

 

Once the breakpoint has been enabled, it will be shown by a solid circle on the connection. We want to set two breakpoints, one either side of the ToString Converter operator

 

We%20have%20set%20breakpoints

We have set two breakpoints

 

Debugging using Breakpoints

If we run our pipeline as normal, nothing will have changed. If we want our pipeline to trigger the breakpoints we’ve just set we want to choose Debug from the Run menu

 

Running%20the%20pipeline%20for%20Debugging

Running the pipeline for Debugging

 

Once the pipeline has hit our first breakpoint, we’ll be able to select our pipeline to view it for debugging

 

Select%20the%20pipeline%20name%20under%20Status

Select the pipeline name under Status

 

Our pipeline is now displayed for debugging – in our example we can see three different symbols on the connections of our pipeline

Debugging%20our%20pipeline

Debugging our pipeline

We can see the two breakpoints we set earlier –  except now the first breakpoint has an outline. This outline indicates which breakpoint has been reached, i.e. where execution of our pipeline has been paused for debugging

When we right click the active breakpoint (breakpoint with the outline), we see three options: Inspect Data, Resume, and Streaming

 

Test

We have three options

 

Inspect Data gives us a look at the data being passed through the connection. In this example, we can check the location of the source file, as well as the encoding and body of the data in the connection

 

Text

Inspect Data tells us the source of the file, as well as the content

Blob isn’t the most helpful format for us as a human, so we close the Inspect Data window then select Resume from the right click menu to let the pipeline continue executing until it hits the next breakpoint. Once the next breakpoint is reached (the one we placed to the right of the ToString Converter operator) we can use Inspect Data again to view the data once it’s been converted to string

The%20string%20format%20is%20far%20more%20readable%20for%20humans

The string format is far more readable

From this window we can tell that our converted CSV is being sent to our HANA Client, with a header row which contains column names

 

Streaming Connections

You may recall a third symbol in our Debugging pipeline – a circle with two arrows. This indicates a Streaming Connection. In contrast to our breakpoints, this connection will not stop execution of our pipeline

Streaming%20connection

Streaming connection

While the pipeline is paused for debugging, we can right click the Streaming Connection symbol for two options: Open Streaming UI, and Breakpoint. Opening the Streaming UI allows us to see data as it flows through the connection

Our

Data flowing through our connection lets us know that the HANA Client has completed without error

 

The second option, Breakpoint, converts our streaming connection into a breakpoint. Similarly, we can right click on a breakpoint while debugging and select Streaming to convert a breakpoint into a streaming connection

You%20can%20swap%20between%20Breakpoints%20and%20Streaming%20Connections

You can swap between Breakpoints and Streaming Connections

 

Conclusion

We’re now familiar with setting breakpoints, debugging our pipelines and using our breakpoints and streaming connections to understand the flow of data inside our pipelines. Using these techniques, we can now better understand our pipelines without having to change them to add additional Wiretap or Terminal operators

I hope this blog post has been helpful, and I welcome any comments or questions below

Note: While I am an employee of SAP, any views/thoughts are my own, and do not necessarily reflect those of my employer

Assigned Tags

      2 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Leena Gopinath
      Leena Gopinath

      Thanks Cameron,  Debugging with break points has been a need and nice blog for how to use this in the new version.

      Regards, Leena

      Author's profile photo Cameron Swift
      Cameron Swift
      Blog Post Author

      Agreed Leena - very useful functionality. Thanks for reading