Technical Articles
SAP Data Intelligence Cloud – Pipeline Simplification Basics
The SAP Data Intelligence Modeler uses a flow-based programming paradigm to create data processing pipelines (also known as graphs)
These pipelines are created through a series of operators, connected in sequence. However, the way your pipeline is modeled will have an effect on the performance. In this blog post, we’re going to cover some very basic modeling principles you may want to keep in mind when creating your pipelines
For the purposes of this example, we’re working with a very simplified pipeline
Our basic pipeline
Our Data Generator here simulates output from an IOT device – sending values for Temperature, CO2, Humidity, etc. These values are then run through a Multiplexer, which sends these values to two custom Go (Golang) operators
The Wiretap operator lets us view the values as they’re passed through
Our Generated data
From there, the operators pull out the value they’re concerned with (in this case, Temperature, and CO2), and pass these to a Terminal Output. In a real-life scenario, some action would be taken based on the values, however for simplicity in this example we’re using Terminals to monitor the values
If you would like to follow along with this blog post, you can find both Before and After pipelines in this repository. When you create a new pipeline in the Data Intelligence Modeler, you can switch between Diagram and JSON view (as shown below) and copy the contents of the pipeline JSON from the repository
Switch to JSON view to maintain pipeline with code
Switching back to the Diagram editor, the first thing we’re going to change is to simplify our pipeline by removing the Multiplexer. As the code inside our Go Operators is mostly identical, we’re going to use Add Port to create a second Output Port on one of our Go Operators. This means not only that we don’t need our Multiplexer, but that we only need to process the data once, and can get rid of our second Go Operator
Right click on our Go Operator, then select Add Port
From here, we have to define our new port. Enter a name (in this case, CO2), then make sure you select Output Port. Next, we have to define the Port Type. If we were sending just the values, we would might choose float64. However, in this case the values are accompanied by text, so we’re using the string type
Add CO2 Output Port
Next, we want to delete the Multiplexer, and our extra Go Operator. Next, connect the Output Port of our Wiretap directly to our Go Operator, and connect the CO2 Output Port to our second Terminal. Then, press the auto-layout button to clean up the layout
A simplified pipeline
Next, we’ll need to make the code changes to our Go Operator (renamed for clarity). First, select it, then click on the Script button to access the underlying code
Click on the Script button to edit
You’ll want to add the two lines that deal with our CO2 Output Port, marked below with “ADD”
package main
import "strings"
var Temperature func(interface{})
var CO2 func(interface{}) //ADD
var values string
func main() {}
func Input(val interface{}) {
values := strings.Split(val.(string), ",")
Temperature("The temperature is " + values[2]) //Sends only Temperature
CO2("The CO2 level is " + values[4]) //Sends only CO2 | ADD
}
Now we can check that both values are output. Save and Run your pipeline, then use the Open UI button to check the output on each Terminal
Check Terminal Output
The readings are coming through
We’ve now verified that our simplified pipeline is working as expected, so we can get rid of our Wiretap, and connect the Data Generator directly to the Go Operator
Our final simplified pipeline
By replacing the Multiplexer and instead adding an Output Port to our Go Operator, we’ve managed to reduce the complexity of our pipeline and avoid code duplication. Again, if you would like to follow this blog yourself, Before and After pipelines are available in this repository. Special thanks go to my colleagues Bengt Mertens and Wei Han for their assistance with Data Intelligence Cloud
Of course, there are many more things to keep in mind when optimizing your pipelines – I plan to share more with you in the future. I hope this blog post has been useful, and I welcome any comments or questions below
Hi Cameron,
thank you for this blog post. This topic is very important in DI.
Can't wait for your next posts about pipeline optimization 😉
Regards,
Yuliya
Thanks Cameron , looking forward for the more on this area. Simple hack, but a point to remember in terms of optimisation and best practices
Regards, Leena