Enrich data in SAP Analytics Cloud with Machine Learning using an SAP Data Intelligence pipeline
In a recent project, we wanted to present data in a SAP Analytics Cloud story. The usecase was to export data from Qualtrics to show it live in the SAC. Additionally we also wanted to include information coming from a Machine Learning model, together with the original data, so that the user can see all information from different sources at a glance. We solved this by building a data pipeline on SAP Data Intelligence, which gathers the data, sends it to a Machine Learning service, and pushes the enriched data to the SAC. In this blog post, I want to show how this is possible and why this is so easy to replicate.
- Step 1: Data access and pipeline structure
- Step 2: Deployment of ML service on BTP
- Step 3: Access ML service from the pipeline
- Step 4: Prepare SAC for connection
- Step 5: Expand the pipeline to push data to SAC
- Step 6: Access Data in SAC and first Dashboard Steps
Prerequisites for replicating this use case are:
- a data source. This can be either a connector querying an API for data (in our case it is a custom Qualtrics connector) or an operator reading static data, for example from a CSV file.
- access to an SAP Data Intelligence instance (DI) with modeler access
- access to an SAP Analytics Cloud (SAC) instance with access to the “Administration” and “App Integration” spaces
- a service serving predictions of a Machine Learning model
- access to SAP Business Technology Platform (BTP) to deploy this service
Step 1: Data access and pipeline structure
We begin our use case on SAP Data Intelligence. Setup a new graph and start with reading the data, that we want to enrich and send to the SAC. You can use for example a “Read File” operator, “Read HANA Table” to get the data. In my case, I wrote a custom python operator. You can read here and here how to start with writing custom python operators, which is a really handy option to work with DI pipelines.
Within my python operator, I also execute some preprocessing steps to prepare the data for my use case. In the end, I end up with a flat CSV file, formatted as a string, to send to the next operator. As you can see in the screenshot above, for now, I’m just sending the CSV string to a Wiretap operator. You see a glimpse of my demo Qualtrics data below.
Step 2: Deployment of ML service on BTP
Now that we have our data at hand, we can enrich it with Machine Learning insights. For this, we need a service that makes a Machine Learning model accessible. In my case, I developed a python FastAPI service that reads the model and exposes an endpoint for returning predictions. How this can be done, you can read in a blog post here. In your case, you can of course use any technology for developing the service, or even use an external, already existing service to include in your pipeline (in this case you can skip the next paragraph).
For the next step, you need to make sure that you have the service URL and endpoint at hand. In my overview at the SAP BTP cockpit, I directly can see the URL for the deployed service (at “Application Routes”).
Step 3: Access ML service from the pipeline
Now it is time to access this service from our pipeline. In the easiest setup, you can simply use an “HTTP Client” operator, that simply sends data via post requests to an API and gets data back. In this case, you have to keep in mind, that your data should not be formatted as CSV but as JSON instead, otherwise, this probably won’t work. In my case, I wrote another python operator that receives the CSV from the first operator and sends a post request for each line in my data set at my Machine Learning service. I can include the returned predictions directly in my data set. The advantage of this approach is that I can make specific preprocessing steps directly in this second python operator, in case my API changes.
In either case: Make sure that your setup is working together and that the interfaces of operators and APIs fit together. This might require a different setup than in my case!
Step 4: Prepare SAC for connection
Now we will prepare the SAC for the connection. I basically went through this blog post here that describes the steps really well. In summary: In the SAC navigate to System, then to Administration, then to App Integration, to configure an OAuth client for your Data Intelligence to authenticate against.
Click on “Add a New OAuth Client”. In a popup window, you have to select the following information, together with the URL of your Data Intelligence instance (see screenshot below). When you click on “Add” you will be provided with secrets and access the information you will need to copy to paste into the Data Intelligence pipeline.
Step 5: Expand the pipeline to push data to SAC
The next step will be, according to this blog post, to set up the missing operators with the information you just created in the SAC. This will close the missing connection between Data Intelligence and SAC so that you can actually send live data to be displayed. You will need:
- a “Decode Table” operator: it transforms the CSV you are sending into a message format. Be aware of the configuration, and that it fits the “Formatter” configuration below.
- (optional) a “1:2 Multiplexer” with “transform message to string”: this helps while debugging your pipeline. You can have a look at your data set after the message transformation and check if everything is okay.
- an “SAP Analytics Cloud Formatter”: turns the message into a suitable form for SAC. Especially take a look at this setting, and also update the other settings to your needs.
- an “SAP Analytics Cloud Producer”: finally creates the connection to the SAC and pushes the data.
You might get an error when you run your pipeline for the first time. You have to open the UI of the last-mentioned operator and “grant permission” to push the data actually. The screenshot below shows the UI, just click on the link and your pipeline should run.
The SAC is pretty picky when it comes to column names. So if you are experiencing a lot of 400 errors, check the column names in your data set for the following characteristics:
- no special characters, only Latin characters, and numbers, as well as characters like “-” or “_”
- the first character needs to be Latin, with no number or special character
Apart from that, you should check the data types, that they are consistent throughout all columns. This also can lead to errors. Unfortunately, the DI and the SAC won’t tell you what the actual problem is. You just need to be really careful.
Step 6: Access Data in SAC and first Dashboard Steps
Finally, you will see a message like “10 rows were added to the dataset”, so you now know, that you are done! You will find your data set in the files section of the SAC (see above). Directly from the data set view, you can create a story, configure the first measurements and insert basic visualizations in your story dashboard (see below).
And this brings me to an end. I hope that you liked my blog post about how to include Machine Learning service replies into data sets in your SAP Data Intelligence pipelines and push everything to the SAP Analytics Cloud. I also hope that you have learned something that will bring you one step further! The greatest flexibility you will have with using custom python operators because in this case, you can control the style and format of your data set in the best way. Feel free to leave some questions and remarks in the comments!