SAP HANA Developer Center: Our experience with an Oracle Migration: Part 2(CURSOR-ELIMINATION)
ELIMINATING CURSORS IN HANA
INTRODUCTION:
Use of cursors in HANA degrades the performance a lot as it performs operation on single record at a time due to which the read and write on table happens more number of times. An alternate to this is to fetch all the records at one stretch and store it in a temporary table type and perform the calculation together. I would like to share a few complex use cases where we eliminated the use of cursors.
USE CASE 1:Reads data by fetching the most relevant record and does update of the most relevant record based on the condition
PROBLEM STATEMENT: We need to select candidates for air Force. There are 2 levels of exams where a single candidate has 2 attempts to clear each level.
Conditions are:
1. There are 3 measure based on which a candidate is tagged ideal for that level
Level 1: Marks >= 70, Age >= 25, weight >= 70
Level 2: Mark >= 90, Age >= 30, weight >= 75
The order of importance of the measures is (in descending order):
- Marks
- Age
- Weight
2. If the candidate has passed level 2 then his job title should be updated as ‘Pilot’ and should not be considered in level 1 selection. And if he has not passed then check if he has passed in level 1, if so then update his job title as ‘Co-Pilot’.
PERFORMANCE IMPROVEMENTS:
Before :
After:
Table :1 MEASURE [row-count :1,00,000 approx.]
ROLL_NO |
NAME |
WEIGHT |
AGE |
MARKS |
CATEGORY |
LEVEL |
2001 |
Vignesh |
75 |
30 |
60 |
B |
2 |
2002 |
Prachi |
75 |
30 |
90 |
B |
2 |
1001 |
Vignesh |
70 |
25 |
70 |
A |
1 |
1001 |
Srivatsan |
70 |
25 |
80 |
A |
1 |
Table 2 : IDEAL
CATEGORY |
LEVEL |
WEIGHT |
AGE |
MARKS |
B |
2 |
75 |
30 |
90 |
A |
1 |
70 |
25 |
70 |
Table 3 : ELIGIBILITY
NAME |
ELIGIBLE |
Prachi |
Y |
Vignesh |
Y |
Srivatsan |
Y |
Table 4: JOB_TABLE[row-count :1,00,000 approx.]
NAME |
JOB_TITLE |
Prachi |
Pilot |
Vignesh |
Co-Pilot |
Srivatsan |
Co-Pilot |
Bottle necks:
- If the candidate has already been tagged as ‘Pilot’ in category B then he shouldn’t be considered for evaluation in Level 1 which can be done using a NOT IN check but it again degrades the performance.
- The data should come in the sequence with the highest relevant score of the candidate getting updated for that particular level.
——CURSOR ELIMINATED
— TEMPORARY TABLE VARIABLE THAT HAS ALL THE ELIGIBLE CANDIDATES FOR LEVEL 1 AND LEVEL 2
— TEMPORARY TABLE VARIABLE THAT HAS ALL THE ELIGIBLE CANDIDATES FOR LEVEL 2
— WE DO A OUTER JOIN TO AVOID SELECTION OF CANDIDATE WHO CLEARED LEVEL 2
— SELECT ONLY THE RECORDS WHERE ‘DEL’ FLAG IS NULL SO THAT WE SELECT THE CANDIDATES WHO HAS NOT CLEARED LEVEL 2 BUT ELIGIBLE FOR EVALUATION IN LEVEL 1
CONCLUSION:
- Assign cursor s to table variables and then process the logic of the cursor at once.
- Cartesian product as a result of inner Join can be used to get all the records that are processed through nested loop.
- We can eliminate NOT IN conditions by setting a ‘DEL’ flag in table A and then do a left outer join with table B and then select the only records having flag as NULL which ensures the records are not selected again in table B this method improves the performance drastically when you are performing a NOT IN condition on a large data set.
Good one ...
Regards
S.Srivatsan
Good Job Vignesh! Very useful blog.
Regards,
Prachi Tiwari
Nice work... (Y)
Good job !
Thank you !
Regards,
Vignesh J.
Informative.Looking forward to more such blogs on performance improvement
Good thing to share your work here, well appreciated!
A few remarks though:
- Lars
Hi Lars
We can't Compare Null Values Right and there was a business Requirement where we had to make the values satisfy the condition,that's why IFNULL was used to assign a value on both sides.
Thanks Lars for the Valuable feedback will try to implement most of these points in my Next Developments.
Sorry, but that looks like a cheap killer argument.
What kind of business requirement requires that the IDEAL table allows NULL values?
Exactly: none. Business doesn't care whether non-specified constraints are implemented as NULLs or not.
NULL is not a business concept - it's a very specific computation concept.
Interestingly every time I find bad design decisions, one of the first responses is that it was a business requirement. If that was true, it would be even worse as it would show that the requirements were not properly captured and agreed.
Hi Lars
In the Example that i have Replicated the column can't be Null but in the ideal Business Use case, it can be hence it Will throw an error/Return No values when i compare null values,and also i can't filter out the records as not null.
We had to compare the columns and Process it in a Particular Hierarchy and select the most Desirable record and then update it Based on the given condition. And also the same Record can Satisfy in Both the Levels but if it is Updated in Level 1 then we Don't want to Consider it for Processing in Level 2.
For Example :
I categorize these Fruits Based on the Class and Color and Then update a column of the Table Based on that. If i don't use an IFNULL check i won't even get a record
And the same item from India can have different Varieties and we have to select the most desirable Record and so basically the ideal table is the master data for the type of Varieties Available for the Particular Item in India ,US and London.
That's why we join the Ideal Table and measure Table to find out in which Category the item Location id combination the Fruit Satisfies and Update the Record in the Measure Table Based on that.
guess in this Case we Require a Check Right .
Please Help in Providing a Better solution on ways to avoid the Use of IFNULL in this Example.
Regards
Vignesh J
I'm not going to discuss yet another model here.
In your original posting you are using the IDEAL.WEIGHT column to join in order to create C_MEASURE table variable.
This was done without special NULL handling.
So C_MEASURE cannot contain NULLs for WEIGHT.
Later for the comparison you make a not so straight forward comparison:
- in case M.WEIGHT is equal or larger than 55 then the condition is that IDEAL.WEIGHT should be equal or larger than M.WEIGHT. This means M.WEIGHT >= 55 only matches if it is also >= IDEAL.WEIGHT
- in case M.WEIGHT is smaller than 55, then don't consider M.WEIGHT at all and just compare IDEAL.WEIGHT with IDEAL.WEIGHT - this means M.WEIGHT < 55 always matches.
You actually don't need the conversion for NULL here. To avoid confusion you may simply insert 0 (zero) into IDEAL.WEIGHT if the condition shouldn't play a role.
Same goes for the other conditions.
Besides, the IFNULL statements make strings out of your numeric columns. You want to be careful with applying range conditions to strings as here lexicographic ordering is used.
If you ask me to recommend something here, I'd say cleaning up the selection logic would be a good move.
Something convoluted like the above is really a pain to maintain...
- Lars
Hi Lars
I agree with your point of eliminating IFNULL in conditional check.Maybe i will try to find ways to avoid the use of IFNULL in my next developments.
So can i say that using IFNULL in Where Condition Checks is not advised.
Regards
Vignesh J
Well you can say that, but I wouldn't.
What I am saying is: use the right tool at the right place.
For this example, IFNULL was not required.
- Lars