Performance: Why you should avoid using cursors!
In this post, I will show you with an example why you should avoid the use of cursors, if there are other ways to fulfill your requirements.
What is a cursor
Cursors are used to fetch single rows from the result set returned by a query and allow the row-by-row iteration through the result set, whereas set based processing can be much faster. Cursors can also cause transactional problems because of the run time.
Set based processing is based on the mathematical concept of set. The operators of the set work on the entire set at a time, while in row-by-row processing every single row is being processed at a time. Figure-1 illustrates the concept of row-by-row processing and set-based processing.
Figure-1: row-based vs set-based
As following you will find an example about row-based and set-based processing.
The question is how to increase the salary of all employees from the department 1.
Create the employees table with the columns employee_id, department_id, salary and update_date and fill the table with random generated data.
-- drop the table employees drop table employees; -- create table with random values create table employees as ( select to_int(element_number) as employee_id, to_int(RAND()*3+1) as department_id , to_int(RAND()*1000 +100) as salary, now() as update_date from SERIES_GENERATE_INTEGER(1, 0, 100) );
do begin declare cursor c_emp for select employee_id, department_id, salary from employees; for a_row as c_emp do if a_row.department_id = 1 then update employees set salary = salary *1.1, update_date = now() where employee_id=a_row.employee_id ; end if; end for; end;
After the cursor has been defined, we are iterating through the result set, checking if the department_id is 1 and updating the salary in the corresponding row. Yes, we could also use just an update statement to do the same.
update employees set salary = salary *1.1, update_date = now() where department_id = 1 ;
The update statement above is doing the same as the cursor but this time as a set.
In the example above, it is obvious how to avoid the usage of cursor, but let me show you below a real-life example.
Expression vs Cursor
Recently I was analyzing a PlanViz to figure out the root cause of a performance issue and I recognized that a loop was taking more than 50% of the run time. In the loop, a partition key has been calculated, row by row.
With the consent of my colleagues, I will show you how the key is calculated. In the following section you will find the code and corresponding run time of the calculation from the partition key with a cursor and with an expression.
First of all, we need some test data. We are using the function SERIES_GENERATE_INTEGER to generate values in a given range and using SYSUUID to generate unique values.
-- drop table drop table partition_key_test; -- create table with random values create table partition_key_test as ( select to_varchar(SYSUUID) as ext_contract from SERIES_GENERATE_INTEGER(1,0,100) );
The code below is the original code with the cursor to calculate the partition key for the entries from the partition_key_test table.
do begin declare cursor c1 for select ext_contract, prodtype, case prodtype when '01' then -- financial instrument: -- use range between 2^30 and ( 2^30 + 2^26 - 1 ) left(bitunset(bitset(bitunset(hash_sha256( strtobin( ext_contract, 'UTF-8' ) ),1,1),2,1),3,4), 16 ) else -- only numbers between 0 and 2^30 - 1: set first two bits to 0 left(bitunset(hash_sha256( strtobin( ext_contract, 'UTF-8' ) ),1,2), 16 ) end as h1 from (select ext_contract, '01' prodtype from partition_key_test); -- arrays for result table declare contr nvarchar(40) array; declare prodt nvarchar(2) array; declare hash int array; -- scalar variables declare idx int := 0; DECLARE pos int; DECLARE asc int; DECLARE h int; declare hex nvarchar(8); for myrow as c1 do idx := idx + 1; contr[:idx] := myrow.ext_contract; prodt[:idx] := myrow.prodtype; -- convert hex to integer h := 0; for pos in 1..8 do asc := ascii(SUBSTR(myrow.h1, :pos, 1)); IF :asc >= 65 THEN h := :h * 16 + :asc - 55; ELSE h := :h * 16 + :asc - 48; END IF; END for; if h = 0 then -- we do not want 0 as hash value -- so convert the next 4 bytes to integer -- this may happen in 1 of 1 billion cases hex := bintohex(bitunset(hextobin(SUBSTR(myrow.h1, 9, 8)),1,2)); for pos in 1..8 do asc := ascii(SUBSTR(:hex, :pos, 1)); IF :asc >= 65 THEN h := :h * 16 + :asc - 55; ELSE h := :h * 16 + :asc - 48; END IF; END for; -- hash value is still 0, so set it to 1 -- probability that his happens is 1 / ( 1 billion ) ^ 2 if h = 0 then h := 1; end if; end if; hash[:idx] := :h; end for; E_TAB_CONTRACT_PART_KEY = unnest( :CONTR, :PRODT, :HASH) as ( "EXT_CONTRACT", "PRODTYPE", "PART_KEY"); select * from :E_TAB_CONTRACT_PART_KEY; end
The same functionality written as a query with an expression to calculate the partition key.
select resultset.ext_contract, resultset.prodtype, case when resultset.h = 0 then 1 else resultset.h end PART_KEY from ( select (((((((((sub_value.ih1_1)) *16 + sub_value.ih1_2)*16 + sub_value.ih1_3)*16 + sub_value.ih1_4)*16 + sub_value.ih1_5)*16 + sub_value.ih1_6)*16 + sub_value.ih1_7)*16 + sub_value.ih1_8) as h, sub_value.h1, sub_value.ext_contract, sub_value.prodtype from ( select case when sub_char.h1_1 >= 65 then sub_char.h1_1-55 else sub_char.h1_1-48 end ih1_1, case when sub_char.h1_2 >= 65 then sub_char.h1_2-55 else sub_char.h1_2-48 end ih1_2, case when sub_char.h1_3 >= 65 then sub_char.h1_3-55 else sub_char.h1_3-48 end ih1_3, case when sub_char.h1_4 >= 65 then sub_char.h1_4-55 else sub_char.h1_4-48 end ih1_4, case when sub_char.h1_5 >= 65 then sub_char.h1_5-55 else sub_char.h1_5-48 end ih1_5, case when sub_char.h1_6 >= 65 then sub_char.h1_6-55 else sub_char.h1_6-48 end ih1_6, case when sub_char.h1_7 >= 65 then sub_char.h1_7-55 else sub_char.h1_7-48 end ih1_7, case when sub_char.h1_8 >= 65 then sub_char.h1_8-55 else sub_char.h1_8-48 end ih1_8, sub_char.h1, sub_char.ext_contract, sub_char.prodtype from ( select ascii(substr(asci.h1,1,1)) h1_1, ascii(substr(asci.h1,2,1)) h1_2, ascii(substr(asci.h1,3,1)) h1_3, ascii(substr(asci.h1,4,1)) h1_4, ascii(substr(asci.h1,5,1)) h1_5, ascii(substr(asci.h1,6,1)) h1_6, ascii(substr(asci.h1,7,1)) h1_7, ascii(substr(asci.h1,8,1)) h1_8, h1,ext_contract, prodtype from ( select case when hash_dataset.h1 = '00000000' then bintohex(bitunset(hextobin(SUBSTR(hash_dataset.h1, 9, 8)),1,2)) else hash_dataset.h1 end h1, hash_dataset.ext_contract, hash_dataset.prodtype from ( select dataset.ext_contract, dataset.prodtype, case when dataset.prodtype = '01' then left(bitunset(bitset(bitunset(hash_sha256( strtobin( dataset.ext_contract, 'UTF-8' ) ),1,1),2,1),3,4), 16 ) else left(bitunset(hash_sha256( strtobin( dataset.ext_contract, 'UTF-8' ) ),1,2), 16 ) end as h1 from (select ext_contract, '01' prodtype from partition_key_test) dataset ) hash_dataset ) asci ) sub_char ) sub_value ) resultset;
Comparison cursor vs expression
The table below represents the run times for cursor and expression, when given different data sizes.
|1||1 ms||20 ms|
|100||5 ms||30 ms|
|10000||10 ms||770 ms|
|100000||20 ms||7600 ms|
|1000000||140 ms||75 seconds|
HANA – databases in general – can process set-based data faster then row-by-row. Keep in mind to check whether the cursor can be replaced with a simple select, update, delete statement or nested cursors with a join or subselect operation. There is almost always another way to rewrite the cursor in a more efficient way. Exceptions prove the rule.