Technical Articles
Code Pushdown
As we know when we move to a HANA database the idea is to go pull on for “code pushdown”.
The bottleneck for performance has traditionally been moving records between the database and the application server.
The more records that get moved, the longer it takes. Once the data has made it to the application server you do all sorts of calculations upon it using ABAP.
The “new” idea is to do all the calculations at the database level, and then only transfer a small amount of records (the result) to the application server. The HANA database is optimised for this, it does all sorts of tricks e.g. paralleled database reads. The end result – in theory – is that if you “push down” data intensive calculations to the database then your application will run one billion times faster.
At this point you are probably saying “oh yes, everyone says this code pushdown thing is so great, no-one ever gives a practical example”.
Well here I go with one. It is not a very good example, but it is realistic – which beats the pants off many other examples I have seen. If you have a better example – and I am sure you do then tell me what it is. However what I am about to describe is something I have dealt with every month for twenty years.
You probably know the VBKD “business data for sales orders” table which contains things like customer purchase order numbers and payment terms, and it works like this – if every single line item in a sales order is identical then there is only one entry for the sales order with an item number (POSNR) of 000000. If the person entering the sales order changes such a value on line item level then a new entry is added to the table with the POSNR of the item being changed. This design was clearly intended to save database space – what is the point of having ten identical entries for ten lines items plus another identical entry for the header?
All well and good, but this means that programs generally read in the entire contents of VBKD into an internal table. Then traditionally you read that internal table with the item number and if that does not work then you do another read on the header level. That used to require two READ TABLE statements in ABAP but in recent releases you can do this in one statement as shown below
DATA(ls_vbkd) = VALUE #( lt_vbkd[ vbeln = ls_vbap-vbeln
posnr = ls_vbap-posnr ]
DEFAULT VALUE #( lt_vbkd[ vbeln = ls_vbap-vbeln
posnr = '000000'] ) ).
That is great – however you would think if we can now do clever things like that on internal tables, surely we can do clever things like that whilst reading the database in the first place? That is, we just want one record transferred to the application server from the database – the item record (if it exists) or the header record – that is we wish to “push down” the logic shown above to the database layer.
As it turns out I could not find a way to do that in an ABAP SQL statement (which does not mean there is not a way to do it, you can tell me how if you want) nor in a CDS view.
In an ADMP however you can in theory create a class that takes in a sales order number and item (or more likely a table of such) and returns the correct VBKD record (or more likely a table of such).
That would be written in SQLSCRIPT syntax which might make your head spin e.g. the idea of FOR / ENDFOR instead of a LOOP / ENDLOOP sends some ABAP people running for the door. That shouldn’t be the case, all programming languages have different syntax, and it’s nothing to be scared of.
Regardless the concept is clear – inside the database the ADMP tries to get the record for the order / item combination and if that does not exist gets the record for the order/header combination.
OK then let us try to create an ADMP to do just this.
I go to my lovely ABAP in the Cloud system which has a HANA database attached. There is no VBKD table there, so I must create my own Z equivalent for this exercise.
What do we often look for in VBKD – the payment terms, which say something like “30 days after month of invoice”. Now when you are buying Monsters from the Baron the variable is not so much how long you have to pay, rather what will happen to you if you don’t pay.
So, I set up a domain which lists the various payment options.
Domain
The last option is by far the worst. People on those payment terms tend to pay in advance.
Next I create a data element where that domain will live. Now I will create the equivalent of VBKD. To keep things as simple as possible this is only going to have tree fields – the Monster sales order number and item and the payment terms. I have to create the VBELN and POSNR data element equivalents as well, as there is no such thing as VBAK and VBAP in ABAP in the Cloud. I will call my new table ZMONSTER_BDATA.
Table
DDIC definitions in ABAP in the Cloud are all source code based, which takes some getting used to, but after all it is not that different to defining internal tables.
Next I need a little program to fill that table with bogus data. In the old days I would write a type 1 executable program in SE38. However, that is so ten minutes ago. In “Steampunk” I need to create a class. If that class implements interface IF_OO_ADT_CLASSRUN then I can invoke the MAIN method directly, just like pressing F8 on the SE38 screen.
Generator
I don’t want anything fancy – in normal ABAP world if I wanted to get the pertinent records of item 10 of Order number 1 and Item 10 of order number 2 then I would have to transfer three records from the database to the application server and then filter out the header level record for order number 2.
Instead I just want two records brought back – the correct record for each order. With just two orders this makes no difference but imagine if you had ten thousand or more, which is often what I get in real life.
To fill up the table I select my class in the explorer view on the left-hand side of the Eclipse screen and take the menu option “Run As => ABAP Application (Console)”. Hooray! It tells me the three records have been saved to my (HANA) database!
Next step is to write a new class which will query the database for my two monster sales orders and bring back one and only one record from the database for each of them.
I need to flag the reading class with the marker interface IF_AMDP_MARKER_HDB so the implementation can be executed as a stored procedure inside the HANA database. In theory (according to the documentation) I should be able to have a RETURNING parameter but the first error came when I tried to get the whole structure back as a RETURNING parameter and the second error came when I tried to get ANYTHING back as a RETURNING parameter so I have fallen back on an EXPORTING parameter.
The definition therefore looks like this: –
Definition
And the draft implementation like this: –
Implementation V1
That is the easy bit, everything in traditional ABAP (more or less) up until now.
The next step is for me to jump right out of my comfort zone and start coding in SQLSCRIPT. That is very scary. So, I do a Google search and it is no help whatsoever. Even the SAP help is no use. So, it is guessing time. That is good, it is the best way to learn. It is like the Agile Manifesto. https://agilemanifesto.org/
“Experimenting Yourself” over “Being told how to do something”
While we believe there is value on the statement on the right we believe here is more value in the statement on the left.
Going back to messing around in SQLSCRIPT Eclipse gives me error messages guiding me how to write in this new language – first up you end lines with a “;” not a period. After a while of looking on SCN to try and work out the syntax and trying assorted things at random I come up with the following: –
Implementation V2
That appears to work just fine. You could do that in ABAP – two SQL statements one after the other, or read the whole table and then do two reads on the internal table to try and get the correct record.
Here all the “smarts” if you can call it that, are on the database level. The obvious question is – what is the difference? Presumably that makes things faster? Does it? Tell me if it does not.
This is the first SQLSCRIPT I have ever written. I am sure I have got it horribly wrong. The very first ABAP I wrote in 1999 was not very good either, the very first BASIC I wrote in 1981 was not up to much.
So, if someone wants to tell me what I SHOULD have done then I am all ears.
Cheersy Cheers
Paul
Hello Paul,
I would do it in ABAP CDS view. CDS is preferred over AMDP. The SQL statement CASE does the trick.
The nicest way is to create first a CDS view which determines the VBKD-POSNR for every VBAP record.
In the second CDS view the VBAP is linked to the record of VBKD based on the VBKD-POSNR of the first CDS view.
First CDS view: --Find VBKD-POSNR--
Second CDS view: -- Link VBAP and VBKD based on first CDS view --
Kind regards,
Alwin.
i just replicated the above two CDS views in my system, and then did a ST05 SQL trace,
As far as I can tell that means only two records are returned from the database to the application server, one for each of my queries, first query where there were two records in VBKD for one item, second query where there was just one. Normally I would have to get all three records.
What I find most fascinating about this is that often on the internet people say that CDS views are not FASTER than open SQL (now ABAP SQL) queries because they have to be transformed into SE11 views and then read.
In this case however the answer seems to be very straightforward - less records are transferred between the database and the application server and that meets a "golden rule".
Can this be true? Can a CDS view be faster than a direct SQL statement in ABAP?
Lars also mentions in the fantastic article below that there is a performance hit when calling an AMDP for a trivial problem like I am trying so solve. So therefore is the CDS view the best option out of all three?
Also kudos to Lars for telling me what a WINDOW is in database world.
I had never even heard of such a thing before.
In the UK, before I was born, George Formby sang:-
"Now I go cleaning windows,
To earn an honest bob,
For a nosey parker,i t's an interesting job,
Now it's a job that just suits me
A window cleaner you would be
If you can see what I can see
When I'm cleanin' windows"
I like this solution for ABAP CDS, and have used it often.
FYI, instead of the case statement in "First CDS view: –Find VBKD-POSNR–", this is a perfect situation for the COALESCE function.
can be replaced by
Coalesce returns the first value if it is NOT NULL, otherwise returns the second value.
https://help.sap.com/doc/abapdocu_751_index_htm/7.51/en-us/abencds_f1_coalesce_expression.htm
That's shorter and good readability too. Nice improvement!
Thanks for this post, Paul. It's a good thing to see how the ABAPers of the world approach HANA, SQL, and DB programming.
The example you picked is a really good one. It's one of the idiosyncratic data access patterns found in ABAP but not so much elsewhere.
Stuff like that can really trip up analytics developers, data scientists, and otherwise SQL-oriented folks that get into projects and should "just grab the data and do something"...
As you've asked, I do have some comments about your solution and also some ideas about how to do it differently. Once I started thinking about it, the whole thing got lengthy so I put it into a separate blog post.
I hope you don't mind the shameless plug here: https://lbreddemann.org/find-the-one-or-the-other/
BTW: thumbs up for the COLES-down down logo. That's Australian familiarity in a picture 🙂 and fits well with the topic of code push down.
I’m not sure about this, and if I’ve got it wrong, then I’d love to be corrected as I'm partly mentioning this to clarify my understanding – but I think your method won’t have any parallelisation. I did quite a lot of AMDP a couple of years ago. As I recall you only take advantage of parallelisation when you have multiple declarative (is that the right term?) statements like:
The selects will be run in parallel and your dessert will be served at the same time as your main course.
Not quite.
By using table variables like in your example (both are not referencing each other and don't reference any shared thing either) you're saying: "main_course" and "dessert" don't depend on each other.
This means, those could be run in parallel and the result would be the same.
Whether or not they actually will be computed in parallel depends on things like available "job-executor threads" at the moment the AMDP gets called.
So it may be that sometimes the statements are executed at the same time and sometimes one after the other.
Code Pushdown works faster when Agreggation comes into play if not… IMO can be even slower. The problem that I found with Code Pushdown is that it goes againt unit testing or not?
This is a real life example.
A report was copied from other system non HANA. A simply
This report was taking 7 hours because there where Millons of records.
Using only OPEN SQL with SUM and CASE I managed to did both logic into a single statement.
The report now it took only 2 min
From 7 hours to 2 min only replacing a LOOP with an IF and an Acumulation.
I was amazed and everyone was very happy…. but now that I’m starting ( and trying ) to really get into Unit testing and TDD I was wandering if this is ok ? I mean if now CodePushdown tells me that I have to put everything that I can into an OPEN SQL query, CDS or AMDP I can’t split the logic into small pieces.
Is it my questioning right ? how do Unit Testing and CodePushdown coexists ?
One note:
The selection of totals directly in SQL should make an improvement, of course,
but the fact that the original code took 7 hours sounds like it was extremely inefficient (don't use sorted/hashed internal tables, etc.).
I don't think there is any ABAP code, no matter how complex it is, which should take 7 hours to be executed.
I cannot be certain of course but it was probably the database read which was taking the time, due to BSEG being a cluster table.
Especially if one of the fields in the SELECT statement was not a primary key of the underlying table, you have the horrible problem of the cluster field having to be "unpacked" as it were.
Unpacking several million records to send the relevant fields back to the application server must take ages....
I can only presume doing the same inside the database using an aggregation is a zillion times faster....
A valid point,
but I don't think there is any reason that selection from BSEG by BKPF key (partial key, but should be sufficient) would take 7 hours. Personally, I have never ran into such issue.
On the other hand, I did see programs which perform a nested loop for internal tables with million records which takes this amount of time.
The questioning is definitively right.
You have a functionality that you programmed and you want to include it into your unit tests.
Commonly unit tests explicitly exclude accesses to external services (like DB, filesystem, web-services, and the like) by not actually performing them. In order to still get a meaningful result of the function under test, the resource-access gets "mocked".
Many unit test systems support mocking of DB accesses. For HANA I'm aware of the mocking option in HANA XS(classic) and the SQLScript unit test framework.
The idea with DB unit tests is twofold:
1. Same input data and same query should give the same result. That's true for deterministic queries and can be handled straight forward by using the same test data for the same test cases.
2. For non-deterministic results (i.e. the current_date is used, or an otherwise changing element) then the result-check must accommodate for that (e.g. perform an equivalent computation in the test code).
Either way, an important aspect is that a unit test is a functional test - not a performance test.
If you have a query that processes millions of records and that takes 2 minutes, that's obviously impractical for a unit test just like the 7 hours wouldn't be practical for a unit test of the ABAP code.
Instead, you would run the query either over mocked data or create a test-data set that allows for very fast result computation and is still meaningful for the functional test.
So, no, code-pushdown does not mean the unit tests can't cover it.
Concerning the remark that you cannot split the logic into small pieces with CDS/AMDP/SQL:
That's not true at all.
It's very much possible to create composable units of logic in SQL (and SAP's tooling for that).
Subqueries can be made into functions, views, or common table expressions.
Views can be created in a layered approach where each layer takes care of a specific aspect of the query.
Table functions can take the results of other table functions as input data to create a "stackable" function.
There's plenty of opportunities to split logic into smaller pieces.
Obviously, though, this doesn't work if you program just a single SELECT statement. If you want to split the logic, you have to, well, split it...
Hello,
Regarding "......As it turns out I could not find a way to do that in an ABAP SQL statement (which does not mean there is not a way to do it, you can tell me how if you want)...."
Maybe this was already given in one of the comments, I guess below Select would bring what you need in 1 statement (i.e. for order '0000000001').
TYPES : BEGIN OF lty_table,
vbeln TYPE vbeln,
posnr TYPE posnr,
vbkd_posnr TYPE posnr,
zterm TYPE dzterm,
END OF lty_table.
DATA lt_table TYPE TABLE OF lty_table.
SELECT vbap~vbeln vbap~posnr vbkd~posnr AS vbkd_posnr vbkd~zterm
INTO CORRESPONDING FIELDS OF TABLE lt_table
FROM vbap
INNER JOIN vbkd ON vbkd~vbeln = vbap~vbeln
WHERE vbap~vbeln = '0000000001'
AND ( ( EXISTS ( SELECT * FROM vbkd WHERE vbkd~vbeln = vbap~vbeln AND posnr = vbap~posnr ) AND vbkd~posnr <> '000000' )
OR ( vbkd~posnr = '000000' AND NOT EXISTS ( SELECT * FROM vbkd WHERE vbkd~vbeln = vbap~vbeln AND posnr = vbap~posnr ) ) ).
I just did a test and that works OK in that it brings back a table with both records, but correctly maps the VBAP-POSNR to the correct VBKD-POSNR and thus the correct VBKD-ZTERM is in each line as well.
The idea though is to also specify the VBAP-POSNR in the SELECT and thus only transfer one record from the database to the application server.
So the final SQL query would be
SELECT vbap~vbeln vbap~posnr vbkd~posnr AS vbkd_posnr vbkd~zterm vbkd~prsdt
INTO CORRESPONDING FIELDS OF TABLE gt_table
FROM vbap
INNER JOIN vbkd ON vbkd~vbeln = vbap~vbeln
WHERE vbap~vbeln = p_vbeln
AND vbap~posnr = p_posnr
AND ( vbkd~posnr NE '000000' AND ( EXISTS ( SELECT * FROM vbkd WHERE vbkd~vbeln = vbap~vbeln AND posnr = vbap~posnr ) )
OR ( vbkd~posnr EQ '000000' AND NOT EXISTS ( SELECT * FROM vbkd WHERE vbkd~vbeln = vbap~vbeln AND posnr = vbap~posnr ) ) ).
In any event I am very glad you gave me this solution. I did not think it could be done!