Technical Articles
SAP DI automation technique with an example
Many of the functionalities available in SAP Data Intelligence come very helpful in even running some mundane tasks which otherwise would take more time to accomplish. Knowing how SAP Data Intelligence can help in doing some of these simple automation tasks can demonstrate the flexibility of the tool and hope it triggers some better ideas for taking care of more complex requirements in a project.
Problem definition:
In one of the customer implementation, there was a need to get the structure of the CDS views along with the data type definition information from the source S/4 HANA systems. The receiving team needed this information, so the receiving tables can be built at the target system before they start consuming the data from SAP DI instance, through the Kafka producer operator.
With over 200 pipelines in the plan, it is not going to be an easier task to get this information for all the CDS view sources for all 200 plus pipelines. Planning to do it manually would take many hours to get this done. There has to be an easier way to automate this.
Here there are two issues that need to be resolved for automating this simple mundane task
- Getting the structure of all CDS views automatically (Why not the SAP DI itself using a graph, as that is going to replicate the source via Kafka producer)
- Build a second graph (for batch job execution) that can trigger multiple instances of the required graph (for issue #1) with proper configuration parameters, for getting the structure information for each CDS view in parallel, to meet the customer need
While the first issue of getting the structure itself can be handled in a graph using a custom operator (Python, Go Lang or a Javascript), second issue needs to be resolved using some automation technique of restarting the same graph for all 200 plus CDS views.
For the second issue, thanks to the concept of parametrization and Dimitri’s blog (for replicating multiple tables using a single pipeline) that would make it bit easier to handle it as well.
Let’s look at how simply the SAP Data Intelligence accomplish this much quickly.
Getting the structure of all CDS views
First issue first, is to build a graph to get the CDS view structure from S/4 HANA.
I built a simple graph as shown below, with a custom Python operator along with the ABAP CDS Reader V2 operator to read the S/4 HANA CDS views.
For information, ABAP CDS Reader V2 output provides a message output with metadata information (ABAP & Fields) in the header of the message including information about the column names with respective data types while the data itself comes in csv format. (In comparison to CDS Readers V0 or V1 along with the ABAP Converter operator, you get the output in one of the chosen format from csv, json or in xml but you get no information about the column names)
Below is the simple pipeline for extracting the CDS view structure
Graph #1: Pipeline for Reading one CDS View Structure
As the V2 operator provides the metadata from the source structure itself, now it is as simple as getting the information from the ABAP Fields metadata of the output message.
Also, since the need is only to get the ABAP CDS View structure information, we have to bring only one record (records per roundtrip in CDS Reader configuration), after which the pipeline will be stopped as we do not need the data for this, and got the required information in that one record. Also see the ${cdsname}, which is the configuration parameter for taking the CDS view in the graph.
CDS Reader configuration to get the ABAP info only
Below is the simple Python code to extract the required structure information from the message
Python code for extracting ABAP Fields
Now, here is an example of the data output from this pipeline for building the target table structure (To test, I passed a single value for ${cdsname} parameter to make sure it is providing the expected output). The output is written to a compatible target, in our case, we send it as separate file to the S3 target, which can be used by the target table build team to build the tables.
Output of the structure information (Formatted)
( Note: Refer for ABAP “Types and Objects – Overview” information on the structure )
Batch job execution pipeline
Secondly, the task is to run the pipeline created in first task for close to 200 CDS views. This is where the parameterization concept of the DI graph comes into help.
Now this new pipeline has another Python operator that reads the delimited string of CDS views, where the CDS view names are picked and passed on as the configuration parameter (substitution value for ${cdsname} configuration parameter that is used in the Graph #1) for the next operator which starts the CDS view structure graph (Graph #1) using open API calls.
This has already been explained in the blog mentioned earlier. But there will be a minor tweaking to the Python code to handle the CDS Views to handle this specific need.
Graph #2: Pipeline to start the CDS View structure pipeline
Graph #2 using the Start graph operator will start the Graph #1 as many times as there are values available for the ${cdsname} parameter.
This starts close to 200 plus graphs very close to each other. Even though the CDS view structure read graphs are all started as separate instances in a very short time, while the first few pipelines run immediately, sequentially the later ones will stay in pending state before moving to the running state only after the required resources are available and allocated to each of them.
Conclusion
So the required information was made available in less than few minutes saving few hours in producing the same. Hope this helps to trigger better ideas to automate more complex tasks.
Good Article, Vasi!
Well done, thanks for sharing, Vasi!
I especially like the high contribution in terms of improved efficiency that you bring up with your solution.