Why SDI ABAP for virtual access in SAP Data Warehouse Cloud should be avoided
In the past two years, I noticed one of the common mistakes when using SAP Data Warehouse Cloud is trying to use SDI ABAP connection for federation. Thus I am thinking of some simple examples to make everyone understand why it will not work.
First, it is strongly recommend everyone read SAP Data Warehouse Cloud : First Guidance: Data Integration for ABAP Source Systems. before starting the DWC project. However, the information there is so condense that it might take a while till you realize what it does mean exactly. One of the main points is Smart Data Integration (SDI) ABAP for virtual access in DWC should be avoided. as we know limitation of SDI ABAP: it supports SELECT, WHERE, TOP, or LIMIT but it does not support to push down joins and group by.
In this blog, three examples are demostrated.
|Example 1||Calculate how many times for each customer has booked the flight||Select from single table SBOOK, group by is not pushed down to the source. In a real case, it means one table of millions of records transferred to DWC and the result (No. of customers) are only several ten thousand.|
|Example 2||Calculate total of each airline company||Select from two tables SBOOK, SFLIGHT. Join is not pushed down to the source .In a real case, it means two huge tables of millions of records transferred to DWC to calculate around 10 to 20 records.|
|Example 3||Calculate total only one airline company||Select from two tables SBOOK, SFLIGHT. Filter is pushed down to the source to both tables, In a real case, this could help you reduce the records transferred back to DWC.|
|Example 1, 2 and 3||From all three examples, you will see in ABAP side, all the columns are also selected and read even they are not used. For some big tale like ACDOCA (more than 300 columns), this will also be a performance killer to the source system (S/4 or BW).|
The tables and views used in this blog could be referred as below picture. I use the Sflight mode, basically you could find this demo in each ABAP system those tables. And I am only using two tables Sbook and Sflight.
If you do not know how to configure the SDI ABAP connection. Here is SAP help document how to install, configure and connect Data Provision Agent to DWC.
Here are the versions: DWC 2022.10.61, DP Agent 126.96.36.199. S/4 HANA: S/4H 2020.
In ABAP system transaction SE80, you could search package SAPBC_DATAMODEL. Then I check table SBOOK, it has 24 columns and 5 of them are keys.
Here is Sflight and it has 14 columns, 4 of them are keys.
In order to make the examples are as simple as possible, we just use less than 10 records to illustrate the issue. But of course, if we are talking only about hundreds of records, replication or federation will not make a big difference regards performance. However, in a real business case, when it comes to hudred thousand to millions of records in the S/4 or BW systems. It will make a huge difference.
8 records in SFLIGHT
9 Records in SBooks
|Database Explorer on HANA Cloud||Generate PlanViz, run SQL|
|ST05 Trace in ABAP on S/4 HANA||Track SQL trace triggered by Data Warehouse Cloud in the on premise system|
ABAP Development Tool (Or similar Eclipse Tool) on your PC
|Check the generated PlanViz Plan|
Example 1 Calculate how many times each customer has booked the flight
First, let us have a look at the architecture. Sbook in S/4 HANA has 24 columns and 9 rows. Example 1 is only on this one table. It tries to get only 2 colmns and final result will be aggreaged to 3 rows.
Below are the definition of booking_customerID. It is simply a projection of SBOOK_Demo, which is the technique name I deployed for SBOOK as a remote table in DWC.
SELECT "MANDT", "CARRID", "CONNID", "FLDATE", "BOOKID", "CUSTOMID" FROM "SBOOK_Demo"
Then I tried to aggregate the data to count how many bookings each customer has.
select "CUSTOMID" , count ("BOOKID") as "No of bookings" from "booking_customerID" group by "CUSTOMID"
In an idea case, as this is from only one system and one table, you expect probably everything could be pushed down to the source for calculation then only 3 records are returned. So let us have an insight look via different tools.
Generate PlanViz in HANA Cloud for example 1
If you do not know what is Planviz, please check this blog The HANA PlanVisualizer (PlanViz) – Quick and Easy
To do this, you need to enough authorization to access the backend HANA Cloud system. You can check the database users and groups in the help document.
Here you can run similar SQL in DB explorer as the SQL generated by data preview. Pay attention that you need to add the data schema in DB Explorer (different than the SQL view you create in a space), which is a DWC space technique name.
In above picture, you will see the result is the same as data preview. In case the query comes from SAC, you can trace the Ina query and use HANA built-in stored procedure SYS.EXECUTE_MDS to generate the PlanViz in HANA Cloud. I will not cover how to trace SAC in this blog. But you can follow steps in SAP note 2525191 – SAP Analytics Cloud Best Practice
Save Planviz locally, I tried to give a meaningful prefix of the name.
Now I am trying to open it locally on my PC with ABAP Development Tool in Eclipse. You can use your some Eclipse Tool like HANA Studio or Visual Studio Code (GitHub Repository)
here is how the PlanViz looks like
ST05 SQL Trace on S/4 HANA
Now let us have a look at what kind of SQL has been set to the ABAP system. Now you need to log into the S/4 system at the ABAP side. In case you do not know what is ST05 traces in ABAP, here is a classic blog (The SQL Trace (ST05): Quick and Easy) .
Run ST05 Performance Trace, Turn on the trace with filter. I am tracing the technique user of DWC and all servers. Important: ABAP SDI connect only on one applicaiton server. And it could be a different server than your current user log on. By default, you only trace current application server. If it is a different one than the DWC use, you can not get anything.
If you do not know which application server it use at DWC side, you could check below (In case you do not have authorization in DWC to check below, you need contact your admin) . Then in ABAP system, switch to this application server which you use to connect for DWC in below screen for traces.
So the steps will be
- Turn on ST05 trace in ABAP systems
- Run data preview in DWC or similar SQL in DB explore.
- Make sure you see the result and stop the ST05 trace in ABAP system
- Check the ST05 Trace
You will now see a lot of entries in the traces but you can restrict to only see the SQL which access our table SBOOK or SFLIGHT.
Double click the SQL and you will see columns wise it is also not restricted. All 24 columns are read even the view only tries to read two columns.
Here I can conclude the 1st example, group by and columns are not push down. Imagine if you use this for a very wide and big table. Even the final result are only small, the records returned could be big.
Example 2 Calculate total of each airline company
Here are the data models of example 2. We will see all the records and columns from two tables are transferred from S/4 HANA to DWC. The aggregations and joins cannot be pushed down, this will only happen in DWC after the records are transferred.
Definition of view flight_booking_basic, it is inner join and projection of two tables Sflight and Sbook.
SELECT "SBOOK_Demo"."MANDT", "SBOOK_Demo"."CARRID", "SBOOK_Demo"."CONNID", "SBOOK_Demo"."FLDATE", "SBOOK_Demo"."BOOKID", "SBOOK_Demo"."CUSTOMID", "SBOOK_Demo"."CUSTTYPE", "SBOOK_Demo"."CLASS", "SBOOK_Demo"."ORDER_DATE", "SBOOK_Demo"."COUNTER", "SBOOK_Demo"."AGENCYNUM", "SFLIGHT_Demo"."PRICE", "SFLIGHT_Demo"."CURRENCY", "SFLIGHT_Demo"."PLANETYPE", "SFLIGHT_Demo"."PAYMENTSUM" FROM ("SBOOK_Demo" INNER JOIN "SFLIGHT_Demo" ON "SBOOK_Demo"."MANDT" = "SFLIGHT_Demo"."MANDT" AND "SBOOK_Demo"."CARRID" = "SFLIGHT_Demo"."CARRID" AND "SBOOK_Demo"."CONNID" = "SFLIGHT_Demo"."CONNID" AND "SBOOK_Demo"."FLDATE" = "SFLIGHT_Demo"."FLDATE")
Here is definition of view Total_Company, it calculate the total of all the airline companies.
In a real business case, calculation of earnings fits better to calculate LOCCURAM (price of booking in local currency airline). But just to demonstrate the technique aspect of joins and filters, I use PRICE in Sflight table to make it simple. (Date and currency conversions are not considered)
select "CARRID", sum("PRICE") as "Total" from "flight_booking_basic" GROUP BY "CARRID"
I will follow the same procudure for the traces as example 1
- Turn on ST05 trace in ABAP systems
- Run data preview of view “Total_company” in DWC or similar SQL in DB explorer.
- Make sure you see the result and stop the ST05 trace in ABAP system
- Check the ST05 Trace
Here is the result you see in data preview of DWC. 2 records with 2 columns.
In ABAP side from ST05 traces, you can see records (column: Sätze) selected and how many seconds (column: Dauer) it takes to read for each table.
Double click the SQL column, you will see unfortunately besides all records are scanned, all columns are also read.
Example 3 Calculate total only one airline company
In this example, I just run SQL command in DB Explore to add a filter (where condition) to view Total_Company to demonstrate you that the filter could be pushed down to the source systems and help reduce the records.
Here is how this SQL command and result looks like. I add CARRID as a filter in below SQL. It return in the end only one result.
From the planViz, we can see the filter has been applied to the remote table scan. Instead of returning 8 and 9 records, here it returns 4 and 5 records after the filter happens in remote source.
In S/4 HANA, ABAP ST05 trace, we can again see the filter has been passed here.
Thus adding filters could help you reduce the records as much as possible. But it will also depends on the SQL cases, some complicated SQL filters can also not pushed down. But with above methods, you already learn how to verify in a real business scenario how it works for you.
Another often coming questions are : shall I use the SDI HANA Adapter to connect to HANA DB directly because it has better push down? You can see from the help that select individual columns, join , agrgregations data via group by are supported.
First, You can again do some similar test. The differences are in HANA connection, you need to turn on HANA SQL traces instead of ABAP traces. For HANA connections, you can even have one more possiblity in addition to SDI for federation as below. Cloud Connector can be used both for federation (using Smart Data Access) or replication with data flows in DWC.
Secondly, even the SDI HANA adapter or SDA Cloud Connector has better push down, you can not also simply say that the HANA connection is better for a complete use cases. You need to evaluate your own cases based on different factors like security control, how to reuse authorizations, input parameters in S/4 HANA CDS views, limitations while generation HANA views from BW systems and many others.
Connect HANA on Premise to DWC
SDI ABAP is for replication, not good for federation. If you are still not sure what to use, an assessement workshop/service together with SAP and customers are recommended.
Call for actions:
You can test your own data model with above methods. And test different senarios including
SDI ABAP using Data Provision Agent.
SDI HANA using Data Provision Agent.
SDA for HANA using Cloud Connector.
Your feedback is welcomed!
Hello Zili Zhou
Thank you for a great and in detail explanation.
I noticed you are using a standard ABAP connection with DP Agent.
Could you please tell me what happens when an SAP BW/4HANA Model Transfer connection is used for federation? Does it work the same way, also not pushing down the aggregation?
Up to this point, I understood that DWC will be connected with remote tables of HANA using JDBC (link).
After all, we are providing HANA on-premise user credentials for the connection. If Data Provisioning Agent communicates with ERP, and via JDBC with SAP HANA, I hoped it would support the aggregation.
There is not much in the documentation, and I cannot test it myself. For example, when the remote tables from BW are federated, is the data also accessed through an ABAP adapter of DP Agent or through direct JDBC.
Many thanks in advance,
the BW/4HANA model transfer connection uses a HANA connection to transfer the data and therefore is very well suited for federated access. Apart from the fact that artefacts are automatically generated, the data access via HANA is a major advantage of the model transfer connection.
Thank you, Jens!
Much appreciated and long time no hear 🙂
this article really helps me understand how DWC connects to SAP on-premise systems like SDI, and its technical difficulties, thanks.
many thanks for this informative blog entry. If you as a customer face performance issues at any time while leveraging the ABAP SDI adapter, i encourage you to leverage the RFC streaming feature DWC provides. You find the information about this feature in the documentation.
This comes especially into account when you transfer large amounts of data into DWC.
Please be aware that the SAP Gateway is a mandatory prerequisite for leveraging this feature.