Overcoming Encoding Issues when using the Table Producer Operator in SAP Data Intelligence Graphs
Hello out there 😃
You face some encoding issue with creating a target table in SAP Data Warehouse Cloud when using the SAP Data Intelligence operator “Table Producer”? Or, you just would like to broaden your knowledge about potential pitfalls when trying to create a remote target table using the table producer operator in SAP Data Intelligence? Good! Because in these cases, you are looking at exactly the blog post you should follow on reading. 😋
Motivation & Approach
Last week, I was using the table producer operator of the structured data operators in SAP Data Intelligence to push sales data from SAP Data Intelligence to SAP Data Warehouse Cloud. Unfortunately, I ran into a problem when trying to create the target table in a SAP Data Warehouse Cloud schema that incorporates a hashtag.
In order to prevent you from trying different, possibly timely, solutions in case you encounter the same problem, I would like to equip you with a simple trick I used in order to make the architecture work. This should enable you to solve the issue fast and reliable.
The approach taken for this blog post is simple: first, I will set the stage by explaining the prerequisites, the use case and the SAP Data Intelligence graph. Then, we will encounter, analyze and solve the problem. At then end, you find the lessons learned from this blog post in a summarized form.
In general, I understand this blog post as an enhancement to this blog post written by Yuliya Reich. In her blog post, Yuliya describes how to connect SAP Data Intelligence to SAP Data Warehouse Cloud by using the operator “Table Producer”.
But…is there any preparation needed from your side for understanding or rebuilding the solution of this blog post? Well, there are no prerequisites required to understand and redo this blog post – almost. 😉
To understand this blog post, you should know the basics of a graph in SAP Data Intelligence and have a fundamental understanding of the syntax of the JSON data format. The linked references will be more than enough in order to fully equip you with the required knowledge.
For reworking the use case of this blog post, you should have access to a SAP Data Intelligence tenant (which seems to be obvious, though 😁). In addition, some sort of data uploaded to the Semantic Data Lake of the SAP Data Intelligence tenant are required. You can learn how to upload a dataset into the Semantic Data Lake of SAP Data Intelligence in this article of the SAP Help Portal.
Data & Infrastructure
For this blog post, I use a Data Intelligence Cloud version 2103.32.25 as well as a SAP Data Warehouse Cloud version 2021.17.0.
So, from my local file system, I uploaded the sales data to the Semantic Data Lake of SAP Data Intelligence, profiled and published it to the Metadata Catalog. This blog post will guide through the profiling and publishing process. The sales data is of simple structure, owning five columns as shown in the image below.
Uploaded to the Semantic Data Lake, the profiled dataset’s fact sheet reveals that there are no null, blank or zero values.
A data preview also shows that no data quality issues came up during the upload. The dataset’s quality is hence good to be processed hereafter.
Use Case & Graph Architecture
In order to draw business insights out of the sales data, I would like to transfer it to SAP Data Warehouse Cloud in order to model the data analytically and to visualize the data in the embedded Story Builder of SAP Data Warehouse Cloud to gain further insights into the sales transactions.
The pipeline I used for that purpose is straightforward: A workflow trigger starts the pipeline. The structured file consumer then reads the CSV data from the Semantic Data Lake. After that, the table producer creates a target table in SAP Data Warehouse Cloud. The target table should capture the sales data once they are transferred from SAP Data Intelligence to SAP Data Warehouse Cloud. Finally, the graph is stopped by a workflow terminator. For more visual clarity, the graph is presented in the picture below.
To enable you to grasp some more details about the architecture, the configuration of the structured file consumer and the table producer are exerted below.
With regards to the structured file operator, the service selected is “SDL” since I am retrieving the sales data from the Semantic Data Lake. I select my connection to SAP Data Warehouse Cloud as a “Connection ID”. In the source entry field, I insert the CSV file via the file explorer that pops up.
In order to be able to save the sales data in a table of SAP Data Warehouse Cloud, a connection to it must be established. Details on the connection configuration process can be retrieved here. The name of the obligatory Open SQL Schema is automatically set by SAP Data Warehouse Cloud. It follows the following naming convention:
As you can see, a hashtag is inserted by default. The hashtag concatenates the SAP Data Warehouse Cloud space name, which I connect to from SAP Data Intelligence, with my SAP user ID.
Once the connection is established, I configure the table producer operator. With the table producer operator, I create and configure the target table in which I would like to store the data in SAP Data Warehouse Cloud. In its target configuration, I therefore select “HANA_DB” as a service, choose the connection I created, and open the popup to specify the target table. The corresponding configuration popup opens once the button, which I highlighted in yellow in the picture below, is pressed.
After clicking on the “Add Target” button, another popup appears, The system tells me that any target I enter in the corresponding popup will be created during runtime upon graph execution. In effect, I choose a column name for my to-be-created column. I name it “SALES_DATA” – very creative, huh? 😉.
After I confirm the name by clicking on the “Add” button, the configuration console automatically returns me to the completely filled out target configuration as indicated in the next picture. At first sight, it appears to me that the hashtag in my schema name is automatically replaced by the configuration console with its respective encoding sequence “%23”, shown in yellow.
Encountering the Problem
Because I configured the whole pipeline at that point, it is time to run and test it. …the pipeline is “pending for execution”, “running” … and then, … oh no! … ran into an error. 😱
The stack trace prints out the following text:
Group messages: Group: default; Messages: Graph failure: tableproducer1 failed with the following error: DBS-070407: |Loader LoadMapper_924fe8e7-cab0-42ec-86dd-ea5a4b9cfd7b_tableproducer1_SALES_DATA SQL submitted to database <> resulted in error <General error;362 invalid schema name: AI_TWO%2523D064899: line 1 col 21 (at pos 20)>. The SQL submitted is <CREATE COLUMN TABLE “AI_TWO%2523D064899″.”SALES_DATA” ( “TRANSCATION_ID” INTEGER , “VENDOR_NAME” NVARCHAR (25) , “NO_SOLD_ITEMS” INTEGER , “PRICE” DECIMAL(4, 2) ) >. Process(es) terminated with error(s). restartOnFailure==false
To make my graph run through successfully, I surely must analyze and solve this error! 😏
Analyzing the Problem
The first valuable information I draw out of this stack trace is that it is the table producer operator that provokes this error. So, some configuration must be incorrect.
Second, the stack trace tells me that my schema name would be invalid. This makes me wonder a bit since I selected my schema with the configuration popup displayed in the previous section. The schema name was automatically inserted into the target field by the configuration popup. 🤔
But the stack trace does not let me alone at this point – it prints out the schema name it was looking for. And for sure, that schema name does not exist. 😉
<dwc_space_name>%2523<sap_user_id> (Incorrect Schema Name; taken from Stack Trace)
Based on the original schema name, “<dwc_space_name>#<sap_user_id>”, the configuration console changed the schema name to “<dwc_space_name>%23<sap_user_id>” and at the end, the schema is called with the format “<dwc_space_name>%2523<sap_user_id>”. Because “%25” is the encoding sequence for the percentage sign “%”, it seems to me that the configuration console recognizes the percentage sign of the encoding sequence “%23” (see picture “Table Producer (Configured)”) as a foreign character that must be encoded itself. This is wrong, of course. As a result, the pipeline therefore crashes since the resulting schema name does not exist in SAP Data Warehouse Cloud.
This matter is the crucial point that admires to be solved to make the graph work successfully.
Solving the Problem
The first option that came into my mind was to change the name of the Open SQL Schema in SAP Data Warehouse Cloud and remove the hashtag that is included in my schema name. However, the name of the Open SQL Schema is maintained automatically. The hashtag serving as a concatenation character between the space name and the SAP user ID suffix is immutable. This approach does hence not work, so a solution in SAP Data Intelligence is required.
It is fact that the table producer operator inserts a schema name automatically into the target field. It also seems to be fact that this field is not editable manually – at least in the configuration console. 😁 When I switch from the configuration console to the JSON editor however, I gain write access to the schema name. The picture below illustrates the button that allows me to switch from the configuration console to the JSON editor.
In the JSON editor, it becomes visible that the “double encoding” with “%2523” in the schema name is already captured in the JSON object of the table producer operator upon schema selection.
I thus replace the “%2523” in the qualified name and in the owner attribute in the JSON object “remoteObjectReference” with a hashtag to correspond to the actual schema name.
Another test run reveals that the graph completes normally after the above changes are made.
In addition, I check if the table is created in SAP Data Warehouse Cloud and appropriately filled with all data. To do so, I check if the table exists in the Data Builder. And yes – it is there!
It also holds all data records of the original CSV file stored in the Semantic Data Lake.
Great! 😃 So, now I can continue working on my initial use case in SAP Data Warehouse Cloud, model the data analytically and finally visualize it in the embedded Story Builder.
Lessons Learned & Conclusion
In this blog post, I share an issue with automatically inserted encoding sequences for hashtags in SAP Data Intelligence’s table producer operator. The issue occurs when a schema of a remote target (in my case SAP Data Warehouse Cloud) incorporates a hashtag in its name when it is selected as a target table via the configuration console. The issue can be easily resolved by replacing the encoding sequences back to a hashtag in the JSON editor of the table producer operator.
In a more general form, keep in mind that
- the configuration editor could insert encoding complexity and encoding mistakes which might make a graph fail during execution, and that
- the JSON object of an operator can be changed to modify the operator configuration. The JSON editor is a simple but powerful tool.
With these words, thanks for joining today. Feel free to like and share this blog post, and to follow along to receive recent updates on upcoming blog posts! 🤗
Best wishes & Stay tuned,