Skip to Content

     We have a Chinese version(http://scn.sap.com/community/chinese/hana/blog/2014/03/03/sap-hana-ce-function%E5%BC%95%E7%94%B3-%E6%80%A7%E8%83%BD%E5%AF%B9%E6%AF%94) of this document.

     In the previous blog, I have introduced the usage of common CE functions simply. And this blog will compare the performance between the CE functions and the corresponding SQL query sentences, using some test scenarios. According to the result of these test scenarios, we can get some tips, which can be chosen to use for SQL tuning in SQL procedures.

CE_UNION_ALL

     CE_UNION_ALL function is equivalent to the UNION ALL operation of two table variables, which is the combination of two table variables. And then the outcome is bound to another table variable. In the version before SPS6, the operation of UNION ALL on column tables will be transferred to the row engine. But after the version of SPS6, HANA has optimized the UNION ALL operation, and then it can be implemented on the column engine. In addition to pay attention, independent codes can be executed parallelly in SQLScript SQL procedures. So the two parts of UNION ALL operation are generally considered to be executed parallelly.

     For example, the two SQL procedures use CE_UNION_ALL() function and UNION ALL SQL sentence respectively to execute the union operation of two tables. And the data volume of the two tables is 189W and 15W.


dropprocedure union_all_test_1;
create procedure union_all_test_1(out trans transport)
language sqlscript as
begin
 d5300_transport = CE_COLUMN_TABLE("D5300", ["ACT_GI_DATE", "TRANSPORT_ID", "MATERIAL", "BATCH", "SOURCE_PLANT", "SOURCE_STORE", "TARGET_PLANT", "TARGET_STORE", "TOTAL_COST", "WEIGHT"]);
                           
 d5200_transport = CE_COLUMN_TABLE("D5200", ["ACT_GI_DATE", "TRANSPORT_ID", "MATERIAL", "BATCH", "SOURCE_PLANT", "SOURCE_STORE", "DISTR_CHAN", "SALES_OFF", "TOTAL_COST", "WEIGHT"]);
                           
 d5200_transport2 = CE_PROJECTION(:d5200_transport, ["ACT_GI_DATE", "TRANSPORT_ID", "MATERIAL", "BATCH", "SOURCE_PLANT", "SOURCE_STORE","DISTR_CHAN" AS "TARGET_PLANT", "SALES_OFF" AS "TARGET_STORE", "TOTAL_COST", "WEIGHT"]);
                           
 trans = CE_UNION_ALL(:d5300_transport,:d5200_transport2);
end;
drop procedure union_all_test_2;
create procedure union_all_test_2(out trans transport)
language sqlscript as
begin
      
 trans =
select "ACT_GI_DATE", "TRANSPORT_ID", "MATERIAL", "BATCH", "SOURCE_PLANT", "SOURCE_STORE",  "TARGET_PLANT", "TARGET_STORE", "TOTAL_COST", "WEIGHT" from "D5300"
union all
select "ACT_GI_DATE", "TRANSPORT_ID", "MATERIAL", "BATCH", "SOURCE_PLANT", "SOURCE_STORE", "DISTR_CHAN"  AS "TARGET_PLANT", "SALES_OFF" AS "TARGET_STORE", "TOTAL_COST", "WEIGHT" from "D5200";  
                    
end;

     The processing time of these two SQL procedures shows as follows:

/wp-content/uploads/2014/06/1_482391.jpg

/wp-content/uploads/2014/06/2_482392.jpg

     In this test scenario, the processing time of the two methods is similar.

CE_AGGREGATION

     The CE_AGGREGATION function groups the input columns and computes aggregation for each group. And it is equivalent to the combination of GROUP BY sentence and aggregation functions.

     For example, the two SQL procedures use CE_ AGGREGATION () function and GROUP BY sentence respectively to execute the group and count operation on a table of which data volume is 1184W.     


dropprocedure groupby_test_1;
create procedure groupby_test_1(out tmp2 groupby)
language sqlscript as
begin
 tmp1 = CE_COLUMN_TABLE("RECORD",["USERID","CAMPID","MEDIAID","TYPE"]);
 tmp2 = CE_AGGREGATION (:TMP1,[COUNT(*) AS CNT], ["USERID","CAMPID","MEDIAID","TYPE"]);
end;
drop procedure groupby_test_2;
create procedure groupby_test_2(out tmp1 groupby)
language sqlscript as
begin
 tmp1 = SELECT COUNT(*) AS CNT,USERID,CAMPID,MEDIAID,TYPE FROM "RECORD" GROUP BY USERID,CAMPID,MEDIAID,TYPE;   
end;

     The processing time of these two SQL procedures shows as follows:

/wp-content/uploads/2014/06/3_482396.jpg

/wp-content/uploads/2014/06/4_482397.jpg

     As the result shows, in this test scenario, the processing time of the two methods is similar.   

CE_JOIN    

     CE_JOIN function implements the INNER JOIN operations of two table variables, using the join condition.

     For example, the two SQL procedures use CE_ JOIN () function and INNER JOIN sentence respectively to execute the inner join operation.  And the data volume of the two parts is 1184W separately.


dropprocedure join_test_1;
create procedure join_test_1(out join_result innerjoin)
language sqlscript as
begin
 tmp1 = CE_COLUMN_TABLE("RECORD",["USERID","CAMPID","MEDIAID","TYPE"]);
 tmp2 = CE_AGGREGATION (:TMP1,[COUNT(*) AS CNT], ["USERID","CAMPID","MEDIAID","TYPE"]);
 tmp3 = CE_PROJECTION(:tmp2,["USERID","MEDIAID" as "MEDIAID1"]);
 tmp4 = CE_PROJECTION(:tmp2,["USERID","MEDIAID" as "MEDIAID2"]);
 join_result = CE_JOIN(:tmp3,:tmp4,["USERID"],["USERID","MEDIAID1","MEDIAID2"]);
               
end;            
drop procedure join_test_2;
create procedure join_test_2(out join_result innerjoin)
language sqlscript as
begin
 tmp1 = SELECT USERID,CAMPID,MEDIAID,TYPE FROM "RECORD" GROUP BY USERID,CAMPID,MEDIAID,TYPE;   
 join_result = select f.userid,f.mediaid as mediaid1,l.mediaid as mediaid2 from :tmp1 as f join :tmp1 as l on f.userid=l.userid;
end;

     The processing time of these two SQL procedures shows as follows:

/wp-content/uploads/2014/06/5_482398.jpg

/wp-content/uploads/2014/06/6_482399.jpg

     We can conclude that, in this scenario, the processing time of CE_JOIN() function is shorter than the processing time of INNER JOIN. Here CE_JOIN() function has the higher performance.

NOTICE

      According to the result of these test scenarios, the CE functions and corresponding SQL sentences have similar performance in most cases. But in particular cases, the CE functions have even higher performance.

     But when using CE functions, something needs to be paid attention. Each CE function need to bind the result to a table variable, that is to say the result need to be materialized. Then for complex operations using CE functions, it may cause performance problem. So we need to balance the performance to choose to user whether CE functions or SQL sentences.

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

  1. Lars Breddemann

    Sorry, but this needsre-work!

    The examples miss crucial steps (e.g. generating the data or defining the output table types).

    The performance observation for SQL vs CE-join is not correct as the work done in both state is different.

    If you simply code the SQL example like this:

    create procedure join_test_3(out join_result innerjoin) 

    language sqlscript as 

    begin 

    join_result = select f.userid,f.mediaid as mediaid1,l.mediaid as mediaid2

                    from RECORD as f inner join RECORD as l on f.userid=l.userid; 

    end; 

    You’ll find that the performance is nearly equal to the CE_JOIN.

    In fact, the additional time required by your example was due to the materialization you forced by assigning the table variable for SQL first.

    Also: for CE-functions you typically don’t have materialization of table variables. That’s the whole point of using CE-functions and the calculation engine.

    – Lars

    (0) 

Leave a Reply