Skip to Content

If you’ve watched “Mad Max Beyond Thunderdome”  then “… You know the law: Two men enter, one man leaves …“.

MV5BNDM4N2NkZmEtMjQ2Yi00ZTBkLWExODYtYWJiYzkwNWQ4NDQ4XkEyXkFqcGdeQXVyMTI3MDk3MzQ@._V1_Watch out, he’s coming for you! — pic from www.imdb.com.

HANA, of course, reverses, sorry, disrupts this law by making two (or more) values get out of functions, even when only one input value was provided.

Wait, what?

Alright, this Thunderdome reference is really a massive plug and only used to get your attention here. Now, that I seem to have it, let us get to the topic.

If you’ve been using SQLScript for some time you are aware of user defined functions (UDF). These UDFs can nowadays handle both scalar and table typed parameters, both for the input and for the output aka result of the function.

The main purpose of functions, just like in other programming languages, is code reuse and making the code that uses them easier to understand.
With some thought about the naming, one can even make up parts of a domain specific language for the problem at hand.
Think of

applyTaxTRate(sales_price) as "priceWithGST"

in your SELECT statements instead of

 CASE sales_price < 10 
 THEN sales_price 
 ELSE sales_price * 0.20 
 END as "priceWithGST"

as an example.

All good so far. Now, with HANA performance considerations are nearly always the main concern, so a question often asked is:
Isn’t using functions bad for performance?

Which I can only answer with
Isn’t using high-level languages bad for performance?
And indeed, it is easy to lose some microseconds and several MegaBytes of memory by using functions compared to “pure” SQL. But then again, the main goal for a developer is to implement a functionality (correctly) that eventually provides value for the users.

So, my recommendation is to use functions where it helps with making the programmers more productive and worry about the performance impact, when one can actually measure it.

Back to the topic of this blog.

I had looked at a question on stackoverflow a while ago which basically asked, why using UDFs with more than one result seems to lead to slower execution times, the more results are actually used.

So something like

select 
      getUserDetails (143776).userName
    , getUserDetails (143776).expiryDate
    , getUserDetails (143776).currentAge
    , getUserDetails (143776).customerSegment
from dummy;

would take around four times as long as

select 
    getUserDetails (143776).userName
from dummy;

even though we’re running the same function with the same input parameter.
The programmers’ intuition, well at least mine and that of the OP from stackoverflow, was that HANA would evaluate the function once for the provided input parameter and use the different results from this one execution.

Some testing of the query execution times showed that this could not be the case.
A more in-depth analysis was in order here.

The try setup

For my “learn about the system by poking around“-try I created a very simple scenario: a user account table and a query to find the date for when any user account would expire.
Not wasting time with inventing play-doh data I simply ran:

create column table myusers as (select * from users);

With that in place a UDF was easy to build:

drop function userdetails;
create or replace 
 function userDetails(user_id BIGINT) 
 returns userName NVARCHAR (256),
         expiryDate DATE
as
begin
   select user_name
        , to_date(add_days(valid_from, 180)) as expiryDate
    into userName
      , expiryDate
   from 
         myusers
   where 
         user_id = :user_id;
end;
select userDetails (143776).userName
     , userDetails (143776).expiryDate
from dummy;

USERDETAILS(143776).USERNAME USERDETAILS(143776).EXPIRYDATE
_SYS_REPO                    21.01.2018

This should be equivalent to this pure SQL statement:

select user_name
     , to_date(add_days(valid_from, 180)) as expiryDate
from 
     myusers
where 
     user_id = 143776;
 
USER_NAME EXPIRYDATE
_SYS_REPO 2018-01-21

Looks good, I’d say.

Checking on the execution times shows:

UDF (multi-result)

successfully executed in 39 ms 67 µs (server processing time: 37 ms 819 µs)
successfully executed in 42 ms 929 µs (server processing time: 41 ms 60 µs)
successfully executed in 44 ms 13 µs (server processing time: 42 ms 492 µs)

pure SQL multi-result)

successfully executed in 1 ms 191 µs (server processing time: 0 ms 385 µs)
successfully executed in 1 ms 933 µs (server processing time: 0 ms 825 µs)
successfully executed in 2 ms 507 µs (server processing time: 0 ms 827 µs)

While this is a big difference (30 times) it is actually the wrong one for the current question. What we need to look at is the difference between executions with just one or two results.

Running the statements with just the expiryDate output:

UDF (single parameter)

successfully executed in 29 ms 443 µs (server processing time: 28 ms 200 µs)
successfully executed in 30 ms 61 µs (server processing time: 28 ms 610 µs)
successfully executed in 29 ms 586 µs (server processing time: 28 ms 131 µs)

pure SQL (single parameter)

successfully executed in 1 ms 917 µs (server processing time: 0 ms 735 µs)
successfully executed in 1 ms 63 µs (server processing time: 0 ms 308 µs)
successfully executed in 2 ms 864 µs (server processing time: 0 ms 742 µs)

And here we do see an important difference: the pure SQL runtime kept stable while the UDF fetching just a single result did so quite a bit faster.

Phase One “Reproducing the issue” successfully done.

Moving on with the analysis, I played around with the statements and wanted to check what happens when you select a non-existing user:

Trouble with no data

UDF
select 
     userDetails (-4711).expiryDate
from dummy;

Could not execute 'select userDetails (-4711).expiryDate from dummy' in 40 ms 99 µs . 
SAP DBTech JDBC: [478]: 
user defined function runtime error: "DEVDUDE"."USERDETAILS": 
line 9 col 5 (at pos 131): 
[1299] (range 3) no data found exception: 
no data found at user-defined function (at pos 8)
pure SQL
select
     to_date(add_days(valid_from, 180)) as expiryDate
from 
     myusers
where 
     user_id = -4711;

Shoot!

When there is no matching record we get an error with our UDF. That is likely not what we want, so we need to cater for that in the code.
Here we can use the EXIT HANDLER to catch the ‘NO DATA FOUND’ SQL error.

create or replace 
 function userDetails(user_id BIGINT) 
 returns userName NVARCHAR (256),
         expiryDate DATE
as
begin
declare exit handler for SQL_ERROR_CODE 1299 
 begin 
     userName := NULL;
     expiryDate := NULL;
 end;

select user_name
     , to_date(add_days(valid_from, 180)) as expiryDate
 into userName
    , expiryDate
 from 
      myusers
 where 
      user_id = :user_id;
end;

With this change, the error is caught and the UDF returns NULL values. That’s not the same as an empty set, but good enough for this case and a lot better than the error.

select 
      userDetails (-4711).expiryDate
from dummy;

USERDETAILS(-4711).EXPIRYDATE
?

Trouble with tracing

The next step of the analysis should be to actually show that the UDF is executed once for each result.

I looked into the different usual suspects for statement analysis and was rather disappointed:
SQL Plan Cache, Expensive Statement Trace and PlanViz all only show the top-level statement + statistics.

PlanViz shows a “Row Search”-Pop as the final processing node and this contains the projection list but that’s about it.

Name: Row Search
ID: ID_EA341305E062BA47B440FD71F07CA318_0
Execution Time (Inclusive): 19.9 ms
Execution Time (Exclusive): 19.893 ms
Execution Start Time: 6,934.218 ms
Execution End Time: 6,954.174 ms
CPU Time (User): 19.883 ms
Projected Cols: 
 "DEVDUDE"."USERDETAILS"(143776).USERNAME, 
 "DEVDUDE"."USERDETAILS"(143776).EXPIRYDATE

So what would be a good way to find out whether the function had been called more than once during the statement execution?
I didn’t want to go some low-level tracing, so I came up with a modification of the function:

Pudding, proofs… is it Christmas yet?

create sequence pcnt;

drop function userdetails;
create or replace 
 function userDetails(user_id BIGINT) 
 returns userName NVARCHAR (256),
       expiryDate NVARCHAR (30)
as
begin
 
declare exit handler for SQL_ERROR_CODE 1299 
   begin 
     userName := NULL;
     expiryDate := NULL;
   end;

    select user_name|| '_> '|| pcnt.nextval
         , to_nvarchar(add_days(valid_from, 180), 'DD.MM.YYYY') ||'_> '|| pcnt.nextval as expiryDate
     into userName
        , expiryDate
    from 
         myusers
    where 
       user_id = :user_id;
end;

I am going to use a sequence to add the current counter for execution to the output values.

In pure SQL the sequence is bumped up once for each output row:

select user_name || '_> '|| pcnt.nextval
 , to_nvarchar(add_days(valid_from, 180), 'DD.MM.YYYY') ||'_> '|| pcnt.nextval as expiryDate
 from 
 myusers
 where 
 user_id = 143776;
Execution …
USER_NAME||'_> '||PCNT.NEXTVAL EXPIRYDATE 
_SYS_REPO_> 30                 21.01.2018_> 30
Next execution…
USER_NAME||'_> '||PCNT.NEXTVAL EXPIRYDATE 
_SYS_REPO_> 31                 21.01.2018_> 31

That means, if the UDF is really called once per output argument, then we should see increasing numbers for the same result row.
Let’s see what we get:

select userDetails (143776).userName
     , userDetails (143776).expiryDate
from dummy;

USERDETAILS(143776).USERNAME USERDETAILS(143776).EXPIRYDATE
_SYS_REPO_> 32               21.01.2018_> 33

Here we got our proof, the function actually is executed twice.

Ok… so what?

Alright, up to here I spend your time and attention on explaining how to find out that HANA does not do the thing we liked it to do.
If that was the end of it, you rightfully would ask, what this was for.

Worry not! There is a good end in sight!

The answer lies in using the UDFs in SQLScript blocks.
But not like this:

do begin
declare _userName   NVARCHAR(256);
declare _expiryDate NVARCHAR(30);
 
    _userName  := userDetails (143776).userName;
    _expiryDate:= userDetails (143776).expiryDate;
 
    select _userName, _expiryDate from dummy;
end;
:1              :2 
_SYS_REPO_> 34  21.01.2018_> 35

We have to step up our SQLScript game here.
While we’re already avoiding the stupid SELECT … INTO FROM dummy workaround for assigning function results to variables, it’s obviously calling the function twice now.

Instead, we need to use the function in a single assignment.
How can you assign values to multiple variables in a single assignment?“, you ask?

I did the same until one of our SQLScript developers pointed me to something that is, well, “mentioned” in the documentation. I put “mentioned” in quotation marks, as this non-obvious programming construct is not really explained, but shows up in just one example code (this one)

The solution and answer

do begin
declare _userName   NVARCHAR(256);
declare _expiryDate NVARCHAR(30);
 
    (_userName, _expiryDate) := userDetails (143776);
 
    select _userName, _expiryDate from dummy;
end;
:1               :2 
_SYS_REPO_> 36   21.01.2018_> 36

With this multi-result assignment, we not only have just one function call but also much nicer to read code.

There you go, now you know.

Cheers from Mad Max-country,

Lars


This post was first published on LBREDDEMANN.ORG.

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply