Skip to Content
Technical Articles

Performance: Why you should avoid using cursors!

Introduction

In this post, I will show you with an example why you should avoid the use of cursors, if there are other ways to fulfill your requirements.

What is a cursor

Cursors are used to fetch single rows from the result set returned by a query and allow the row-by-row iteration through the result set, whereas set based processing can be much faster. Cursors can also cause transactional problems because of the run time.

Set based processing is based on the mathematical concept of set. The operators of the set work on the entire set at a time, while in row-by-row processing every single row is being processed at a time. Figure-1 illustrates the concept of row-by-row processing and set-based processing.

Figure-1: row-based vs set-based

Example

As following you will find an example about row-based and set-based processing.
The question is how to increase the salary of all employees from the department 1.

Data preparation:

Create the employees table with the columns employee_id, department_id, salary and update_date and fill the table with random generated data.

-- drop the table employees
drop table employees;

-- create table with random values
create table employees as (
	select
		 to_int(element_number) as employee_id,
		 to_int(RAND()*3+1) as department_id ,
		 to_int(RAND()*1000 +100) as salary,
		 now() as update_date 
	from SERIES_GENERATE_INTEGER(1, 0, 100)
);

 

Row-by-row (Cursor):

do
begin
	declare cursor c_emp for 
	select employee_id, department_id, salary from employees;

	for a_row as c_emp do
		if a_row.department_id = 1 then
			update employees set salary = salary *1.1, update_date = now() where employee_id=a_row.employee_id ;
		end if;
	end for;
end;

After the cursor has been defined, we are iterating through the result set, checking if the department_id is 1 and updating the salary in the corresponding row. Yes, we could also use just an update statement to do the same.

Set-based:

update employees set salary = salary *1.1, update_date = now() where department_id = 1 ;

The update statement above is doing the same as the cursor but this time as a set.

In the example above, it is obvious how to avoid the usage of cursor, but let me show you below a real-life example.

Expression vs Cursor

Recently I was analyzing a PlanViz to figure out the root cause of a performance issue and I recognized that a loop was taking more than 50% of the run time. In the loop, a partition key has been calculated, row by row.

With the consent of my colleagues, I will show you how the key is calculated. In the following section you will find the code and corresponding run time of the calculation from the partition key with a cursor and with an expression.

Data preparation

First of all, we need some test data. We are using the function SERIES_GENERATE_INTEGER to generate values in a given range and using SYSUUID to generate unique values.

-- drop table
drop table partition_key_test;

-- create table with random values
create table partition_key_test as (
select to_varchar(SYSUUID) as ext_contract from SERIES_GENERATE_INTEGER(1,0,100)
);

 

Cursor

The code below is the original code with the cursor to calculate the partition key for the entries from the partition_key_test table.

do begin
declare cursor c1 for select ext_contract, prodtype,
		                                 case prodtype
		                                 when '01' then
		                                 -- financial instrument:
		                                 -- use range between 2^30 and ( 2^30 + 2^26 - 1 )
		                                    left(bitunset(bitset(bitunset(hash_sha256( strtobin( ext_contract, 'UTF-8' ) ),1,1),2,1),3,4), 16 )
		                                 else
		                                 -- only numbers between 0 and 2^30 - 1: set first two bits to 0
		                                    left(bitunset(hash_sha256( strtobin( ext_contract, 'UTF-8' ) ),1,2), 16 )
		                                 end as h1
					from (select ext_contract, '01' prodtype from partition_key_test);

    -- arrays for result table
    declare contr nvarchar(40) array;
    declare prodt nvarchar(2) array;
    declare hash int array;

    -- scalar variables
    declare idx int := 0;
    DECLARE pos int;
    DECLARE asc int;
    DECLARE h int;
    declare hex nvarchar(8);

    for myrow as c1 do
      idx := idx + 1;
      contr[:idx] := myrow.ext_contract;
      prodt[:idx] := myrow.prodtype;
      -- convert hex to integer
      h := 0;
      for pos in 1..8 do
        asc := ascii(SUBSTR(myrow.h1, :pos, 1));
        IF :asc >= 65 THEN
          h := :h * 16 + :asc - 55;
        ELSE
          h := :h * 16 + :asc - 48;
        END IF;
      END for;
      if h = 0 then
        -- we do not want 0 as hash value
        -- so convert the next 4 bytes to integer
        -- this may happen in 1 of 1 billion cases
        hex := bintohex(bitunset(hextobin(SUBSTR(myrow.h1, 9, 8)),1,2));
        for pos in 1..8 do
          asc := ascii(SUBSTR(:hex, :pos, 1));
          IF :asc >= 65 THEN
            h := :h * 16 + :asc - 55;
          ELSE
            h := :h * 16 + :asc - 48;
          END IF;
        END for;

        -- hash value is still 0, so set it to 1
        -- probability that his happens is 1 / ( 1 billion ) ^ 2
        if h = 0 then h := 1; end if;
      end if;
      hash[:idx] := :h;
    end for;
E_TAB_CONTRACT_PART_KEY = unnest( :CONTR, :PRODT, :HASH) as ( "EXT_CONTRACT", "PRODTYPE", "PART_KEY");
select * from :E_TAB_CONTRACT_PART_KEY;
end

 

Expression

The same functionality written as a query with an expression to calculate the partition key.

select 
	resultset.ext_contract, 
	resultset.prodtype,
	case when resultset.h = 0 then 1 else resultset.h end PART_KEY 
from (	
	select 
		(((((((((sub_value.ih1_1)) *16 + sub_value.ih1_2)*16 + sub_value.ih1_3)*16 + sub_value.ih1_4)*16 + sub_value.ih1_5)*16 + sub_value.ih1_6)*16 + sub_value.ih1_7)*16 + sub_value.ih1_8) as h, 
		sub_value.h1,
		sub_value.ext_contract, 
		sub_value.prodtype
	from (
		select 
			case when sub_char.h1_1 >= 65 then sub_char.h1_1-55 else sub_char.h1_1-48 end ih1_1,
			case when sub_char.h1_2 >= 65 then sub_char.h1_2-55 else sub_char.h1_2-48 end ih1_2,
			case when sub_char.h1_3 >= 65 then sub_char.h1_3-55 else sub_char.h1_3-48 end ih1_3,
			case when sub_char.h1_4 >= 65 then sub_char.h1_4-55 else sub_char.h1_4-48 end ih1_4,
			case when sub_char.h1_5 >= 65 then sub_char.h1_5-55 else sub_char.h1_5-48 end ih1_5,
			case when sub_char.h1_6 >= 65 then sub_char.h1_6-55 else sub_char.h1_6-48 end ih1_6,
			case when sub_char.h1_7 >= 65 then sub_char.h1_7-55 else sub_char.h1_7-48 end ih1_7,
			case when sub_char.h1_8 >= 65 then sub_char.h1_8-55 else sub_char.h1_8-48 end ih1_8,
			sub_char.h1, 
			sub_char.ext_contract, 
			sub_char.prodtype
		from (	
			select 
				ascii(substr(asci.h1,1,1)) h1_1,
				ascii(substr(asci.h1,2,1)) h1_2,
				ascii(substr(asci.h1,3,1)) h1_3,
				ascii(substr(asci.h1,4,1)) h1_4,
				ascii(substr(asci.h1,5,1)) h1_5,
				ascii(substr(asci.h1,6,1)) h1_6,
				ascii(substr(asci.h1,7,1)) h1_7,
				ascii(substr(asci.h1,8,1)) h1_8,
				h1,ext_contract, prodtype
			from (
				select
					case when hash_dataset.h1 = '00000000' 
						then 
							bintohex(bitunset(hextobin(SUBSTR(hash_dataset.h1, 9, 8)),1,2)) 
						else 
							hash_dataset.h1 
					end h1,
					hash_dataset.ext_contract, 
					hash_dataset.prodtype 
				from (
					select 
						dataset.ext_contract, 
						dataset.prodtype,
						case when dataset.prodtype = '01' 
							then
								left(bitunset(bitset(bitunset(hash_sha256( strtobin( dataset.ext_contract, 'UTF-8' ) ),1,1),2,1),3,4), 16 )
							else
								left(bitunset(hash_sha256( strtobin( dataset.ext_contract, 'UTF-8' ) ),1,2), 16 )
						end as h1
					from (select ext_contract, '01' prodtype from partition_key_test) dataset
				) hash_dataset
			) asci
		) sub_char
	) sub_value
) resultset;

Comparison cursor vs expression

The table below represents the run times for cursor and expression, when given different data sizes.

rows expression cursor
1 1 ms 20 ms
100 5 ms 30 ms
10000 10 ms 770 ms
100000 20 ms 7600 ms
1000000 140 ms 75 seconds

Conclusion

HANA – databases in general – can process set-based data faster then row-by-row. Keep in mind to check whether the cursor can be replaced with a simple select, update, delete statement or nested cursors with a join or subselect operation. There is almost always another way to rewrite the cursor in a more efficient way. Exceptions prove the rule.

Resources

Avoid Using Cursors

Transactional Problems

Analyzing SQL Execution with the Plan Visualizer

SERIES_GENERATE Function (Series Data)

/
Be the first to leave a comment
You must be Logged on to comment or reply to a post.