Optimize ABAP CDS Performance – Double-Query Approach
Hello all !
Quite some time ago, I found a nice way to overcome some performance and logical issues when using ABAP CDS views.
I was using standard SADL framework to request the data for my OData services from ABAP CDS views, but I was constantly running into either performance or functional issues.
Regarding performance, I guess, it’s clear – all was slow, even GET_ENTITY.
Regarding functional ones, it’s a bit more complicated. But I will try to explain. OData v2 (especially in SAP’s realization) does not really support complex queries through dependent objects.
For instance you have a document header and then a document item with product, a partner, dates etc, and all with 1:N cardinality to the header. Now imagine you need to search trough those products + partners + dates + some header fields simultaneously. If you never programmed any OData, just believe me, it won’t be so easy, if possible at all…
And so, I came to the idea to split the queries into two:
- First, we read distinct CDS Entity Key, based on the provided search criteria;
- Second, we read the required CDS data, based on the selected keys.
As you may notice, this will allow us to use two different CDS views. One is Search View and another one is Data View. Data View represents CDS Entity and by definition has one line per Entity. Search View is needed to get keys, so that it may have whatever you want, any fields, from any entities (product/partner/dates) and must not be 1:1 to the entity.
Note: this approach may work, but also may not. Or even sometimes it will be working (fast), sometimes not. It depends…
Let’s come to one of my examples….
I have a CDS view, containing 33 very big tables.
After adding date information (marked red box), the performance got just terrible.
So, I’ve changed the key of the entity from Object ID (NUM10) to Object GUID (RAW16), which is then used everywhere as JOIN condition, but also a part of primary database indexes.
Then I had to split the query into two as described above.
How Was It Originally
So, imagine you have your SADL implementation and request one entity (GET_ENTITY).
The following query will be executed:
HANA Studio shows about 15 seconds of runtime. Not bad just for one line…
Result: One query is executed in 12 seconds, 45+ million rows read, 28 GB RAM allocated. Not bad!
P.S. Obviously our keys are not passed to the views, and they are read fully.
Ok, now let’s try to split the queries…
First Query (Getting CDS Key):
Result: Query is executed in 10 milliseconds, 1 row was read, 1,2 MB RAM allocated.
Second Query (Getting Data by CDS Key):
Result: Query is executed in ~100 milliseconds, single rows selected, 43 MB RAM allocated.
Just to sum it up…
|Way to get data||Response time||# Rows||RAM Consumption|
|Single query||~ 12 seconds||~ 50.000.000||~ 29 GB|
|Double query||~ 100 milliseconds||~ 30||~ 45 MB|
My OData Framework
Writing complicated GET_ENTITY or GET_ENTITYSET methods might become boring, so that I’ve developed a framework, which implement my idea. With that the implementation of the GET_ENTITYSET methods looks like below:
Looks quite elegant. Feel the same, give it a LIKE.
All the best!
P.S. Code sample is here.
What happens if you join your two CDS Views? 🙂
Why didn't you use associations instead of outer joins? In your model the data will always be retrieved for the outer joins, if you'd have used associations you could have avoided the retrieval of the data from the associations unless it was explicitly requested.
If you've done the same in the other custom CDS views which you're using in ZS_CV_MY_ACTIVITIES, meaning also using (left outer) joins instead of associations, switching over to associations undoubtedly will achieve performance gains for you.
Hi Mehmet, with LEFT OUTE JOIN, the data is never retrieved unless it's requested, so that in this regard LEFT OUTER JOIN is working exactly the same way as just ASSOCIATION.
"When a CDS view is activated with path expressions, every association specified here is transformed to a join expression." + "the category of the join is determined by where the path expression is used: After FROM, it is an inner join (INNER JOIN), In all other locations, it is a left outer join (LEFT OUTER JOIN)"
I was referring to the "JOIN-ON-DEMAND" aspect of associations meaning the association in a CDS view will be converted to a join at runtime when data originating from the association has been requested by the consumer of the CDS.
The outer joins you have created will always be executed, it doesn't matter whether the data retrieved by these joins have been requested or not.
Had you used an association, those outers joins would not have been executed by default, only in those cases where data is explicitly requested from the association a join will be peformed at runtime. This should lead to performance gains on database layer since you would eliminate "redundant" joins.
Oh, now I’ve got your point. So, if we substitute LOJ by ASSOCIATIONs and then build a dynamic SELECT and WHERE statements through associations and their fields, we can potentially simplify the query (less joins generated) and potentially improve performance. Yes, you are right.
But unfortunately, in my case I need to read lots of fields, usually almost all, from all tables. And such a dynamic generation of SELECT and WHERE statements seems overcomplicated to me. But I will consider your proposal for underlying CDS. Probably there are some use-cases, where it’s beneficial.
Coding update: added GET_EXPANDED_ENTITYSET with optimized performance.