ELIMINATING CURSORS IN HANA

INTRODUCTION:

Use of cursors in HANA degrades the performance a lot as it performs operation on single record at a time due to which the read and write on table happens more number of times. An alternate to this is to fetch all the records at one stretch and store it in a temporary table type and perform the calculation together. I would like to share a few complex use cases where we eliminated the use of cursors.

USE CASE 1:Reads data by fetching the most relevant record and does update of the most relevant record based on the condition

PROBLEM STATEMENT: We need to select candidates for air Force. There are 2 levels of exams where a single candidate has 2 attempts to clear each level.

Conditions are:

     1.  There are 3 measure based on which a candidate is tagged ideal for that level

          Level 1: Marks  >= 70, Age >= 25, weight >= 70

          Level 2: Mark >= 90, Age >= 30, weight >= 75 

          The order of importance of the measures is (in descending order):

      1. Marks
      2. Age
      3. Weight

     2.   If the candidate has passed level 2 then his job title should be updated as ‘Pilot’ and should not be considered in level 1 selection. And if he has not           passed then check if he has passed in level 1, if so then update his job title as ‘Co-Pilot’.

PERFORMANCE IMPROVEMENTS:

Before :

/wp-content/uploads/2014/11/performance_3_591998.png

After:

/wp-content/uploads/2014/11/performance_1_592004.png

Table :1 MEASURE  [row-count :1,00,000 approx.]

ROLL_NO

NAME

WEIGHT

AGE

MARKS

CATEGORY

LEVEL

2001

Vignesh

75

30

60

B

2

2002

Prachi

75

30

90

B

2

1001

Vignesh

70

25

70

A

1

1001

Srivatsan

70

25

80

A

1

Table 2 : IDEAL

CATEGORY

LEVEL

WEIGHT

AGE

MARKS

B

2

75

30

90

A

1

70

25

70

Table 3 : ELIGIBILITY

NAME

ELIGIBLE

Prachi

Y

Vignesh

Y

Srivatsan

Y

Table 4: JOB_TABLE[row-count :1,00,000 approx.]

NAME

JOB_TITLE

Prachi

Pilot

Vignesh

Co-Pilot

Srivatsan

Co-Pilot

Bottle necks:

  • If the candidate has already been tagged as ‘Pilot’ in category B then he shouldn’t be considered for evaluation in Level 1 which can be done using a NOT IN check but it again degrades the performance.
  • The data should come in the sequence with the highest relevant score of the candidate getting updated for that particular level.    

——CURSOR ELIMINATED     

— TEMPORARY TABLE VARIABLE THAT HAS ALL THE ELIGIBLE CANDIDATES FOR LEVEL 1 AND LEVEL 2

blog_code5.PNG

— TEMPORARY TABLE VARIABLE THAT HAS ALL THE ELIGIBLE CANDIDATES FOR LEVEL 2

  blog_code6.PNG


— WE DO A OUTER JOIN TO AVOID SELECTION OF CANDIDATE WHO CLEARED LEVEL 2

blog_code7.PNG


— SELECT ONLY THE RECORDS WHERE ‘DEL’ FLAG IS NULL SO THAT WE SELECT THE CANDIDATES WHO HAS NOT CLEARED LEVEL 2 BUT ELIGIBLE FOR EVALUATION IN LEVEL 1


  blog_code8.PNG


CONCLUSION:

  • Assign cursor s to table variables and then process the logic of the cursor at once.
  • Cartesian product as a result of inner Join can be used to get all the records that are processed through nested loop.
  • We can eliminate NOT IN conditions by setting a ‘DEL’ flag in table A and then do a left outer join with table B and then select the only records having flag as NULL which ensures the records are not selected again in table B this method improves the performance drastically when you are performing a NOT IN condition on a large data set.
To report this post you need to login first.

13 Comments

You must be Logged on to comment or reply to a post.

  1. Lars Breddemann

    Good thing to share your work here, well appreciated!

    A few remarks though:

    • Why would you use a JOIN to the IDEAL table? It contains selection – that is filter – criteria. Fetch those into variables first and you can omit the join completely.
    • Instead of NOT IN you may find that NOT EXISTS provides better performance for subsettinng the two sets.
    • You may omit the split up of the total set by filtering based on the max (LEVEL) (e.g. HAVING max(LEVEL) = 1 …)
    • All the IFNULL statements force slow type conversion and I don’t see why they are actually required.
    • Parts of the rules are packed into your CASE statements as magic numbers. The logic there is hard to obtain and not very maintainable.
    • What happened to the 90.000 records that your new implementation does not update, while the original did update them?

    – Lars

    (0) 
    1. Vignesh Jeyabalan Post author

      Hi Lars

      We can’t Compare Null Values Right and there was a business Requirement where we had to make the values satisfy the condition,that’s why IFNULL was used to assign a value on both sides.

      Thanks Lars for the Valuable feedback will try to implement most of these points in my Next Developments.

      (0) 
      1. Lars Breddemann

        Sorry, but that looks like a cheap killer argument.

        What kind of business requirement requires that the IDEAL table allows NULL values?

        Exactly: none. Business doesn’t care whether non-specified constraints are implemented as NULLs or not.

        NULL is not a business concept – it’s a very specific computation concept.

        Interestingly every time I find bad design decisions, one of the first responses is that it was a business requirement. If that was true, it would be even worse as it would show that the requirements were not properly captured and agreed.

        (0) 
        1. Vignesh Jeyabalan Post author

          Hi Lars

          In the Example that i have Replicated the column can’t be Null but in the ideal Business Use case, it can be hence it Will throw an error/Return No values when i compare null values,and also i can’t filter out the records as not null.

          We had to compare the columns and Process it in a Particular Hierarchy and select the most Desirable record and then update it Based on the given condition. And also the same Record can Satisfy in Both the Levels but if it is Updated in Level 1 then we Don’t want to Consider it for Processing in Level 2.

          For Example :

          I categorize these Fruits Based on the Class and Color and Then update a column of the Table Based on that. If i don’t use an IFNULL check i won’t even get a record

          ITEM LOCATION ID FIRST CLASS SECOND CLASS COLOR
          APPLE INDIA Y RED
          APPLE US Y RED
          ORANGE LONDON Y RED
          APPLE US Y GREEN
          ORANGE LONDON Y GREEN

          And the same item  from India can have different Varieties and we have to select the most desirable Record and so basically the ideal table is the master data for the type of Varieties Available for the Particular Item in India ,US and London.

          That’s why we join the Ideal Table and measure Table to find out in which Category the item Location id combination the Fruit Satisfies and Update the Record in the Measure Table Based on that.

          guess in this Case we Require a Check Right .

          Please Help in Providing a Better solution on ways to avoid the Use of IFNULL in this Example.

          Regards

          Vignesh J

          (0) 
          1. Lars Breddemann

            I’m not going to discuss yet another model here.

            In your original posting you are using the IDEAL.WEIGHT column to join in order to create C_MEASURE table variable.

            This was done without special NULL handling.

            So C_MEASURE cannot contain NULLs for WEIGHT.

            Later for the comparison you make a not so straight forward comparison:

            – in case M.WEIGHT is equal or larger  than 55 then the condition is  that IDEAL.WEIGHT should be equal or larger than M.WEIGHT.  This means M.WEIGHT >= 55 only matches if it is also >= IDEAL.WEIGHT

            – in case M.WEIGHT is smaller  than 55, then don’t consider M.WEIGHT at all and just compare IDEAL.WEIGHT with IDEAL.WEIGHT – this means M.WEIGHT < 55 always matches.

            You actually don’t need the conversion for NULL here. To avoid confusion you may simply insert 0 (zero) into IDEAL.WEIGHT if the condition shouldn’t play a role.

            Same goes for the other conditions.

            Besides, the IFNULL statements make strings out of your numeric columns. You want to be careful with applying range conditions to strings as here lexicographic ordering is used.

            If you ask me to recommend something here, I’d say cleaning up the selection logic would be a good move.

            Something convoluted like the above is really a pain to maintain…

            – Lars

            (0) 
            1. Vignesh Jeyabalan Post author

              Hi Lars

              I agree with your point of eliminating IFNULL in conditional check.Maybe i will try to find ways to avoid the use of IFNULL in my next developments.

              So can i say that using IFNULL in Where Condition Checks is not advised.

              Regards

              Vignesh J

              (0) 

Leave a Reply