Introduction

HalilGünal · ‎02-26-2020

Introduction

In this post, I will show you with an example why you should avoid the use of cursors, if there are other ways to fulfill your requirements.

What is a cursor

Cursors are used to fetch single rows from the result set returned by a query and allow the row-by-row iteration through the result set, whereas set based processing can be much faster. Cursors can also cause transactional problems because of the run time.

Set based processing is based on the mathematical concept of set. The operators of the set work on the entire set at a time, while in row-by-row processing every single row is being processed at a time. Figure-1 illustrates the concept of row-by-row processing and set-based processing.

Figure-1: row-based vs set-based

Example

As following you will find an example about row-based and set-based processing.
The question is how to increase the salary of all employees from the department 1.

Data preparation:

Create the employees table with the columns employee_id, department_id, salary and update_date and fill the table with random generated data.

-- drop the table employees

drop table employees;



-- create table with random values

create table employees as (

	select

		 to_int(element_number) as employee_id,

		 to_int(RAND()*3+1) as department_id ,

		 to_int(RAND()*1000 +100) as salary,

		 now() as update_date 

	from SERIES_GENERATE_INTEGER(1, 0, 100)

);

Row-by-row (Cursor):

do

begin

	declare cursor c_emp for 

	select employee_id, department_id, salary from employees;



	for a_row as c_emp do

		if a_row.department_id = 1 then

			update employees set salary = salary *1.1, update_date = now() where employee_id=a_row.employee_id ;

		end if;

	end for;

end;

After the cursor has been defined, we are iterating through the result set, checking if the department_id is 1 and updating the salary in the corresponding row. Yes, we could also use just an update statement to do the same.

Set-based:

update employees set salary = salary *1.1, update_date = now() where department_id = 1 ;

The update statement above is doing the same as the cursor but this time as a set.

In the example above, it is obvious how to avoid the usage of cursor, but let me show you below a real-life example.

Expression vs Cursor

Recently I was analyzing a PlanViz to figure out the root cause of a performance issue and I recognized that a loop was taking more than 50% of the run time. In the loop, a partition key has been calculated, row by row.

With the consent of my colleagues, I will show you how the key is calculated. In the following section you will find the code and corresponding run time of the calculation from the partition key with a cursor and with an expression.

Data preparation

First of all, we need some test data. We are using the function SERIES_GENERATE_INTEGER to generate values in a given range and using SYSUUID to generate unique values.

-- drop table

drop table partition_key_test;



-- create table with random values

create table partition_key_test as (

select to_varchar(SYSUUID) as ext_contract from SERIES_GENERATE_INTEGER(1,0,100)

);

Cursor

The code below is the original code with the cursor to calculate the partition key for the entries from the partition_key_test table.

do begin

declare cursor c1 for select ext_contract, prodtype,

		                                 case prodtype

		                                 when '01' then

		                                 -- financial instrument:

		                                 -- use range between 2^30 and ( 2^30 + 2^26 - 1 )

		                                    left(bitunset(bitset(bitunset(hash_sha256( strtobin( ext_contract, 'UTF-8' ) ),1,1),2,1),3,4), 16 )

		                                 else

		                                 -- only numbers between 0 and 2^30 - 1: set first two bits to 0

		                                    left(bitunset(hash_sha256( strtobin( ext_contract, 'UTF-8' ) ),1,2), 16 )

		                                 end as h1

					from (select ext_contract, '01' prodtype from partition_key_test);



    -- arrays for result table

    declare contr nvarchar(40) array;

    declare prodt nvarchar(2) array;

    declare hash int array;



    -- scalar variables

    declare idx int := 0;

    DECLARE pos int;

    DECLARE asc int;

    DECLARE h int;

    declare hex nvarchar(8);



    for myrow as c1 do

      idx := idx + 1;

      contr[:idx] := myrow.ext_contract;

      prodt[:idx] := myrow.prodtype;

      -- convert hex to integer

      h := 0;

      for pos in 1..8 do

        asc := ascii(SUBSTR(myrow.h1, :pos, 1));

        IF :asc >= 65 THEN

          h := :h * 16 + :asc - 55;

        ELSE

          h := :h * 16 + :asc - 48;

        END IF;

      END for;

      if h = 0 then

        -- we do not want 0 as hash value

        -- so convert the next 4 bytes to integer

        -- this may happen in 1 of 1 billion cases

        hex := bintohex(bitunset(hextobin(SUBSTR(myrow.h1, 9, 8)),1,2));

        for pos in 1..8 do

          asc := ascii(SUBSTR(:hex, :pos, 1));

          IF :asc >= 65 THEN

            h := :h * 16 + :asc - 55;

          ELSE

            h := :h * 16 + :asc - 48;

          END IF;

        END for;



        -- hash value is still 0, so set it to 1

        -- probability that his happens is 1 / ( 1 billion ) ^ 2

        if h = 0 then h := 1; end if;

      end if;

      hash[:idx] := :h;

    end for;

E_TAB_CONTRACT_PART_KEY = unnest( :CONTR, :PRODT, :HASH) as ( "EXT_CONTRACT", "PRODTYPE", "PART_KEY");

select * from :E_TAB_CONTRACT_PART_KEY;

end

Expression

The same functionality written as a query with an expression to calculate the partition key.

select 

	resultset.ext_contract, 

	resultset.prodtype,

	case when resultset.h = 0 then 1 else resultset.h end PART_KEY 

from (	

	select 

		(((((((((sub_value.ih1_1)) *16 + sub_value.ih1_2)*16 + sub_value.ih1_3)*16 + sub_value.ih1_4)*16 + sub_value.ih1_5)*16 + sub_value.ih1_6)*16 + sub_value.ih1_7)*16 + sub_value.ih1_8) as h, 

		sub_value.h1,

		sub_value.ext_contract, 

		sub_value.prodtype

	from (

		select 

			case when sub_char.h1_1 >= 65 then sub_char.h1_1-55 else sub_char.h1_1-48 end ih1_1,

			case when sub_char.h1_2 >= 65 then sub_char.h1_2-55 else sub_char.h1_2-48 end ih1_2,

			case when sub_char.h1_3 >= 65 then sub_char.h1_3-55 else sub_char.h1_3-48 end ih1_3,

			case when sub_char.h1_4 >= 65 then sub_char.h1_4-55 else sub_char.h1_4-48 end ih1_4,

			case when sub_char.h1_5 >= 65 then sub_char.h1_5-55 else sub_char.h1_5-48 end ih1_5,

			case when sub_char.h1_6 >= 65 then sub_char.h1_6-55 else sub_char.h1_6-48 end ih1_6,

			case when sub_char.h1_7 >= 65 then sub_char.h1_7-55 else sub_char.h1_7-48 end ih1_7,

			case when sub_char.h1_8 >= 65 then sub_char.h1_8-55 else sub_char.h1_8-48 end ih1_8,

			sub_char.h1, 

			sub_char.ext_contract, 

			sub_char.prodtype

		from (	

			select 

				ascii(substr(asci.h1,1,1)) h1_1,

				ascii(substr(asci.h1,2,1)) h1_2,

				ascii(substr(asci.h1,3,1)) h1_3,

				ascii(substr(asci.h1,4,1)) h1_4,

				ascii(substr(asci.h1,5,1)) h1_5,

				ascii(substr(asci.h1,6,1)) h1_6,

				ascii(substr(asci.h1,7,1)) h1_7,

				ascii(substr(asci.h1,8,1)) h1_8,

				h1,ext_contract, prodtype

			from (

				select

					case when hash_dataset.h1 = '00000000' 

						then 

							bintohex(bitunset(hextobin(SUBSTR(hash_dataset.h1, 9, 8)),1,2)) 

						else 

							hash_dataset.h1 

					end h1,

					hash_dataset.ext_contract, 

					hash_dataset.prodtype 

				from (

					select 

						dataset.ext_contract, 

						dataset.prodtype,

						case when dataset.prodtype = '01' 

							then

								left(bitunset(bitset(bitunset(hash_sha256( strtobin( dataset.ext_contract, 'UTF-8' ) ),1,1),2,1),3,4), 16 )

							else

								left(bitunset(hash_sha256( strtobin( dataset.ext_contract, 'UTF-8' ) ),1,2), 16 )

						end as h1

					from (select ext_contract, '01' prodtype from partition_key_test) dataset

				) hash_dataset

			) asci

		) sub_char

	) sub_value

) resultset;

Comparison cursor vs expression

The table below represents the run times for cursor and expression, when given different data sizes.

rows	expression	cursor
1	1 ms	20 ms
100	5 ms	30 ms
10000	10 ms	770 ms
100000	20 ms	7600 ms
1000000	140 ms	75 seconds

Conclusion

HANA - databases in general - can process set-based data faster then row-by-row. Keep in mind to check whether the cursor can be replaced with a simple select, update, delete statement or nested cursors with a join or subselect operation. There is almost always another way to rewrite the cursor in a more efficient way. Exceptions prove the rule.

Resources

Avoid Using Cursors

Transactional Problems

Analyzing SQL Execution with the Plan Visualizer

SERIES_GENERATE Function (Series Data)

SYSUUID Function (Miscellaneous)

Performance: Why you should avoid using cursors!

Introduction

What is a cursor

Example

Expression vs Cursor

Data preparation

Cursor

Expression

Comparison cursor vs expression

Conclusion

Resources

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win