Skip to Content
Technical Articles
Author's profile photo Cameron Swift

SAP Data Intelligence Cloud – Pipeline Simplification Basics

The SAP Data Intelligence Modeler uses a flow-based programming paradigm to create data processing pipelines (also known as graphs)

These pipelines are created through a series of operators, connected in sequence. However, the way your pipeline is modeled will have an effect on the performance. In this blog post, we’re going to cover some very basic modeling principles you may want to keep in mind when creating your pipelines

 

For the purposes of this example, we’re working with a very simplified pipeline

 

Our%20basic%20pipeline

Our basic pipeline

 

Our Data Generator here simulates output from an IOT device – sending values for Temperature, CO2, Humidity, etc. These values are then run through a Multiplexer, which sends these values to two custom Go (Golang) operators

 

The Wiretap operator lets us view the values as they’re passed through

 

Our%20Generated%20data

Our Generated data

 

From there, the operators pull out the value they’re concerned with (in this case, Temperature, and CO2), and pass these to a Terminal Output. In a real-life scenario, some action would be taken based on the values, however for simplicity in this example we’re using Terminals to monitor the values

 

If you would like to follow along with this blog post, you can find both Before and After pipelines in this repository. When you create a new pipeline in the Data Intelligence Modeler, you can switch between Diagram and JSON view (as shown below) and copy the contents of the pipeline JSON from the repository

 

Switch%20to%20JSON%20view%20to%20maintain%20pipeline%20with%20code

Switch to JSON view to maintain pipeline with code

 

Switching back to the Diagram editor, the first thing we’re going to change is to simplify our pipeline by removing the Multiplexer. As the code inside our Go Operators is mostly identical, we’re going to use Add Port to create a second Output Port on one of our Go Operators. This means not only that we don’t need our Multiplexer, but that we only need to process the data once, and can get rid of our second Go Operator

Right%20click%20on%20our%20Go%20Operator%2C%20then%20select%20Add%20Port

Right click on our Go Operator, then select Add Port

 

From here, we have to define our new port. Enter a name (in this case, CO2), then make sure you select Output Port. Next, we have to define the Port Type. If we were sending just the values, we would might choose float64. However, in this case the values are accompanied by text, so we’re using the string type

 

Add%20CO2%20Output%20Port

Add CO2 Output Port

 

Next, we want to delete the Multiplexer, and our extra Go Operator. Next, connect the Output Port of our Wiretap directly to our Go Operator, and connect the CO2 Output Port to our second Terminal. Then, press the auto-layout button to clean up the layout

 

A%20simplified%20pipeline

A simplified pipeline

 

Next, we’ll need to make the code changes to our Go Operator (renamed for clarity). First, select it, then click on the Script button to access the underlying code

Click%20on%20the%20Script%20button%20to%20edit

Click on the Script button to edit

 

You’ll want to add the two lines that deal with our CO2 Output Port, marked below with “ADD”

 

package main

import "strings"

var Temperature func(interface{})
var CO2 func(interface{}) //ADD

var values string

func main() {}

func Input(val interface{}) {
    values := strings.Split(val.(string), ",")
    Temperature("The temperature is " + values[2]) //Sends only Temperature
    CO2("The CO2 level is " + values[4]) //Sends only CO2 | ADD
}

 

Now we can check that both values are output. Save and Run your pipeline, then use the Open UI button to check the output on each Terminal

 

 

Check%20Terminal%20Output

Check Terminal Output

 

The%20readings%20are%20coming%20through

The readings are coming through

 

We’ve now verified that our simplified pipeline is working as expected, so we can get rid of our Wiretap, and connect the Data Generator directly to the Go Operator

 

Our%20final%20simplified%20pipeline

Our final simplified pipeline

 

By replacing the Multiplexer and instead adding an Output Port to our Go Operator, we’ve managed to reduce the complexity of our pipeline and avoid code duplication. Again, if you would like to follow this blog yourself, Before and After pipelines are available in this repository. Special thanks go to my colleagues Bengt Mertens and Wei Han for their assistance with Data Intelligence Cloud

 

Of course, there are many more things to keep in mind when optimizing your pipelines – I plan to share more with you in the future. I hope this blog post has been useful, and I welcome any comments or questions below

Assigned tags

      2 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Yuliya Reich
      Yuliya Reich

      Hi Cameron,

      thank you for this blog post. This topic is very important in DI.

      Can't wait for your next posts about pipeline optimization 😉

      Regards,

      Yuliya

      Author's profile photo Leena Gopinath
      Leena Gopinath

      Thanks Cameron , looking forward for the more on this area. Simple hack, but a point to remember in terms of optimisation and best practices

      Regards, Leena