Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
HalilGünal
Employee
Employee

Introduction


In this post, I will show you with an example why you should avoid the use of cursors, if there are other ways to fulfill your requirements.

What is a cursor


Cursors are used to fetch single rows from the result set returned by a query and allow the row-by-row iteration through the result set, whereas set based processing can be much faster. Cursors can also cause transactional problems because of the run time.

Set based processing is based on the mathematical concept of set. The operators of the set work on the entire set at a time, while in row-by-row processing every single row is being processed at a time. Figure-1 illustrates the concept of row-by-row processing and set-based processing.


Figure-1: row-based vs set-based



Example


As following you will find an example about row-based and set-based processing.
The question is how to increase the salary of all employees from the department 1.

Data preparation:

Create the employees table with the columns employee_id, department_id, salary and update_date and fill the table with random generated data.
-- drop the table employees
drop table employees;

-- create table with random values
create table employees as (
select
to_int(element_number) as employee_id,
to_int(RAND()*3+1) as department_id ,
to_int(RAND()*1000 +100) as salary,
now() as update_date
from SERIES_GENERATE_INTEGER(1, 0, 100)
);

 

Row-by-row (Cursor):
do
begin
declare cursor c_emp for
select employee_id, department_id, salary from employees;

for a_row as c_emp do
if a_row.department_id = 1 then
update employees set salary = salary *1.1, update_date = now() where employee_id=a_row.employee_id ;
end if;
end for;
end;

After the cursor has been defined, we are iterating through the result set, checking if the department_id is 1 and updating the salary in the corresponding row. Yes, we could also use just an update statement to do the same.

Set-based:
update employees set salary = salary *1.1, update_date = now() where department_id = 1 ;

The update statement above is doing the same as the cursor but this time as a set.

In the example above, it is obvious how to avoid the usage of cursor, but let me show you below a real-life example.

Expression vs Cursor


Recently I was analyzing a PlanViz to figure out the root cause of a performance issue and I recognized that a loop was taking more than 50% of the run time. In the loop, a partition key has been calculated, row by row.

With the consent of my colleagues, I will show you how the key is calculated. In the following section you will find the code and corresponding run time of the calculation from the partition key with a cursor and with an expression.

Data preparation


First of all, we need some test data. We are using the function SERIES_GENERATE_INTEGER to generate values in a given range and using SYSUUID to generate unique values.
-- drop table
drop table partition_key_test;

-- create table with random values
create table partition_key_test as (
select to_varchar(SYSUUID) as ext_contract from SERIES_GENERATE_INTEGER(1,0,100)
);

 

Cursor


The code below is the original code with the cursor to calculate the partition key for the entries from the partition_key_test table.
do begin
declare cursor c1 for select ext_contract, prodtype,
case prodtype
when '01' then
-- financial instrument:
-- use range between 2^30 and ( 2^30 + 2^26 - 1 )
left(bitunset(bitset(bitunset(hash_sha256( strtobin( ext_contract, 'UTF-8' ) ),1,1),2,1),3,4), 16 )
else
-- only numbers between 0 and 2^30 - 1: set first two bits to 0
left(bitunset(hash_sha256( strtobin( ext_contract, 'UTF-8' ) ),1,2), 16 )
end as h1
from (select ext_contract, '01' prodtype from partition_key_test);

-- arrays for result table
declare contr nvarchar(40) array;
declare prodt nvarchar(2) array;
declare hash int array;

-- scalar variables
declare idx int := 0;
DECLARE pos int;
DECLARE asc int;
DECLARE h int;
declare hex nvarchar(8);

for myrow as c1 do
idx := idx + 1;
contr[:idx] := myrow.ext_contract;
prodt[:idx] := myrow.prodtype;
-- convert hex to integer
h := 0;
for pos in 1..8 do
asc := ascii(SUBSTR(myrow.h1, :pos, 1));
IF :asc >= 65 THEN
h := :h * 16 + :asc - 55;
ELSE
h := :h * 16 + :asc - 48;
END IF;
END for;
if h = 0 then
-- we do not want 0 as hash value
-- so convert the next 4 bytes to integer
-- this may happen in 1 of 1 billion cases
hex := bintohex(bitunset(hextobin(SUBSTR(myrow.h1, 9, 8)),1,2));
for pos in 1..8 do
asc := ascii(SUBSTR(:hex, :pos, 1));
IF :asc >= 65 THEN
h := :h * 16 + :asc - 55;
ELSE
h := :h * 16 + :asc - 48;
END IF;
END for;

-- hash value is still 0, so set it to 1
-- probability that his happens is 1 / ( 1 billion ) ^ 2
if h = 0 then h := 1; end if;
end if;
hash[:idx] := :h;
end for;
E_TAB_CONTRACT_PART_KEY = unnest( :CONTR, :PRODT, :HASH) as ( "EXT_CONTRACT", "PRODTYPE", "PART_KEY");
select * from :E_TAB_CONTRACT_PART_KEY;
end

 

Expression


The same functionality written as a query with an expression to calculate the partition key.
select 
resultset.ext_contract,
resultset.prodtype,
case when resultset.h = 0 then 1 else resultset.h end PART_KEY
from (
select
(((((((((sub_value.ih1_1)) *16 + sub_value.ih1_2)*16 + sub_value.ih1_3)*16 + sub_value.ih1_4)*16 + sub_value.ih1_5)*16 + sub_value.ih1_6)*16 + sub_value.ih1_7)*16 + sub_value.ih1_8) as h,
sub_value.h1,
sub_value.ext_contract,
sub_value.prodtype
from (
select
case when sub_char.h1_1 >= 65 then sub_char.h1_1-55 else sub_char.h1_1-48 end ih1_1,
case when sub_char.h1_2 >= 65 then sub_char.h1_2-55 else sub_char.h1_2-48 end ih1_2,
case when sub_char.h1_3 >= 65 then sub_char.h1_3-55 else sub_char.h1_3-48 end ih1_3,
case when sub_char.h1_4 >= 65 then sub_char.h1_4-55 else sub_char.h1_4-48 end ih1_4,
case when sub_char.h1_5 >= 65 then sub_char.h1_5-55 else sub_char.h1_5-48 end ih1_5,
case when sub_char.h1_6 >= 65 then sub_char.h1_6-55 else sub_char.h1_6-48 end ih1_6,
case when sub_char.h1_7 >= 65 then sub_char.h1_7-55 else sub_char.h1_7-48 end ih1_7,
case when sub_char.h1_8 >= 65 then sub_char.h1_8-55 else sub_char.h1_8-48 end ih1_8,
sub_char.h1,
sub_char.ext_contract,
sub_char.prodtype
from (
select
ascii(substr(asci.h1,1,1)) h1_1,
ascii(substr(asci.h1,2,1)) h1_2,
ascii(substr(asci.h1,3,1)) h1_3,
ascii(substr(asci.h1,4,1)) h1_4,
ascii(substr(asci.h1,5,1)) h1_5,
ascii(substr(asci.h1,6,1)) h1_6,
ascii(substr(asci.h1,7,1)) h1_7,
ascii(substr(asci.h1,8,1)) h1_8,
h1,ext_contract, prodtype
from (
select
case when hash_dataset.h1 = '00000000'
then
bintohex(bitunset(hextobin(SUBSTR(hash_dataset.h1, 9, 8)),1,2))
else
hash_dataset.h1
end h1,
hash_dataset.ext_contract,
hash_dataset.prodtype
from (
select
dataset.ext_contract,
dataset.prodtype,
case when dataset.prodtype = '01'
then
left(bitunset(bitset(bitunset(hash_sha256( strtobin( dataset.ext_contract, 'UTF-8' ) ),1,1),2,1),3,4), 16 )
else
left(bitunset(hash_sha256( strtobin( dataset.ext_contract, 'UTF-8' ) ),1,2), 16 )
end as h1
from (select ext_contract, '01' prodtype from partition_key_test) dataset
) hash_dataset
) asci
) sub_char
) sub_value
) resultset;

Comparison cursor vs expression


The table below represents the run times for cursor and expression, when given different data sizes.


































rows expression cursor
1 1 ms 20 ms
100 5 ms 30 ms
10000 10 ms 770 ms
100000 20 ms 7600 ms
1000000 140 ms 75 seconds






Conclusion


HANA - databases in general - can process set-based data faster then row-by-row. Keep in mind to check whether the cursor can be replaced with a simple select, update, delete statement or nested cursors with a join or subselect operation. There is almost always another way to rewrite the cursor in a more efficient way. Exceptions prove the rule.

Resources


Avoid Using Cursors


Transactional Problems


Analyzing SQL Execution with the Plan Visualizer




SERIES_GENERATE Function (Series Data)