Skip to Content
Author's profile photo Justin Molenaur

Using loops in stored procedures for HANA ETL processes

Purpose: Demonstrate how to use a looping technique to execute stored procedures in smaller chunks. This assumes no external ETL tools are available and stored procedures must be used.

Motivation: Avoid hitting Out of Memory (OOM) errors during processing or make the ETL type operations faster in some cases. I have seen a few SCN posts where people are hitting these OOM errors, and of course have hit them myself so I have figured out how to mitigate this risk. Additionally, I have observed that processing many small data sets vs. one very large one can achieve faster execution times.

Real World Scenario: Client needed to build a denormalized dimension that ended up being quite large – say 30 million records. to build this dimension, 5 seperate fairly large tables needed to be joined together from a central “driver” table. These were mostly many to many joins. Attempting to perform the logic in a stored procedure to build the dimension in one shot was consistently failing due to Out of Memory (OOM) errors.


Real World Solution: Process the required data in smaller chunks by calling the same procedure multiple times and looping through at some logical partitioning criteria. Essentially, performing the same logic for every <region, plant, or company code> in your data or driving table. In the example shown below, I have chosen the field LOOPING_DRIVER_TABLE.DATE, but of course this can be whatever makes sense for your requirement.

Solution details: I can’t take full credit for this as Jody Hesch helped me with some basics, so thanks to him for being generous with his knowledge, this is why SCN community is awesome!

To illustrate a generic approach, the following components are required. Again, this is just an example, your mileage may vary according to your requirements. This assumes a full drop and reload of the target table for simplicity – you might find a delta loading approach may make more sense.

1. Table with list of values, or the actual transaction/driver table

2. Table that contains the data to be joined (lookup)

4. Target table with transaction/driver data plus the lookup attribute

5. Procedure for performing join/lookup and insert with attribute for looping

6. Wrapper procedure for looping

7. Read/Write procedures must be enabled (check configuration/indexserver/repository/sqlscript_mode = ‘unsecure’ to enable read/write procedures)

8. User SYS_REPO must have CREATE/DELETE/INSERT/SELECT object privileges to <YOURSCHEMA> grantable

9. Replace <YOURSCHEMA> and <YOURPACKAGE> placeholders with your own schema and package names

1. Table with list of values, or the actual transaction/driver table



DROP TABLE "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE";
create column table "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE"
    ( "DATE" NVARCHAR (8) not null,
     "CUSTOMER" NVARCHAR (10) DEFAULT '' not null,
     "MATERIAL" NVARCHAR (18) DEFAULT '' not null,
     "SALES_QTY" DECIMAL (15,2) DEFAULT 0 not null,
     "SALES_VALUE" DECIMAL (15,2) DEFAULT 0 not null);
INSERT INTO "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE" VALUES
('20140101', '0000112345', '12345678', 10, 100);
INSERT INTO "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE" VALUES
('20140102', '0000112346', '12345678', 20, 190);
INSERT INTO "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE" VALUES
('20140103', '0000112347', '12345678', 11, 180);
INSERT INTO "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE" VALUES
('20140104', '0000112348', '12345678', 15, 175);
INSERT INTO "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE" VALUES
('20140105', '0000112349', '12345678', 1, 100);
INSERT INTO "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE" VALUES
('20140106', '0000112351', '12345678', 4, 89);





2. Table that contains the data to be joined (lookup)


DROP TABLE "<YOURSCHEMA>"."LOOPING_LOOKUP_TABLE";
create column table "<YOURSCHEMA>"."LOOPING_LOOKUP_TABLE"
    ("CUSTOMER" NVARCHAR (10) DEFAULT '' not null,
     "CUSTOMER_SPECIALTY" NVARCHAR (2) DEFAULT '' not null);
INSERT INTO "<YOURSCHEMA>"."LOOPING_LOOKUP_TABLE" VALUES
('0000112345', 'LB');
INSERT INTO "<YOURSCHEMA>"."LOOPING_LOOKUP_TABLE" VALUES
('0000112346', 'AB');
INSERT INTO "<YOURSCHEMA>"."LOOPING_LOOKUP_TABLE" VALUES
('0000112347', 'HS');
INSERT INTO "<YOURSCHEMA>"."LOOPING_LOOKUP_TABLE" VALUES
('0000112348', 'DM');
INSERT INTO "<YOURSCHEMA>"."LOOPING_LOOKUP_TABLE" VALUES
('0000112349', 'AX');
INSERT INTO "<YOURSCHEMA>"."LOOPING_LOOKUP_TABLE" VALUES
('0000112351', 'ZT');





4. Target table with transaction/driver data plus the lookup attribute


DROP TABLE "<YOURSCHEMA>"."LOOPING_TARGET_TABLE";
create column table "<YOURSCHEMA>"."LOOPING_TARGET_TABLE"
    ( "DATE" NVARCHAR (8) not null,
     "CUSTOMER" NVARCHAR (10) DEFAULT '' not null,
     "MATERIAL" NVARCHAR (18) DEFAULT '' not null,
     "SALES_QTY" DECIMAL (15,2) DEFAULT 0 not null,
     "SALES_VALUE" DECIMAL (15,2) DEFAULT 0 not null,
     "CUSTOMER_SPECIALTY" NVARCHAR (2) DEFAULT '' not null);





5. Procedure for performing join/lookup and insert



--<YOUR_PACAKGE>.SP_WRITE_TARGET_TABLE
/********* Begin Procedure Script ************/
BEGIN
INSERT INTO "<YOURSCHEMA>"."LOOPING_TARGET_TABLE"
SELECT A."DATE", A."CUSTOMER", A."MATERIAL", A."SALES_QTY", A."SALES_VALUE", B."CUSTOMER_SPECIALTY"
FROM "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE" A
LEFT OUTER JOIN
"<YOURSCHEMA>"."LOOPING_LOOKUP_TABLE" B
ON (A."CUSTOMER" = B."CUSTOMER")
WHERE A."DATE" = :LOOP_DATE;
END;
/********* End Procedure Script ************/





–Test Procedure in isolation

CALL “_SYS_BIC”.”<YOUR_PACAKGE>/SP_WRITE_TARGET_TABLE” (‘20140101’);

SELECT * FROM “<YOURSCHEMA>”.”LOOPING_TARGET_TABLE”;

DATE;CUSTOMER;MATERIAL;SALES_QTY;SALES_VALUE;CUSTOMER_SPECIALTY

20140101;0000112345;12345678;10;100;LB


6. Wrapper procedure for looping


--<YOUR_PACAKGE>.SP_LOOP_TEST
/********* Begin Procedure Script ************/
-- scalar variables
i INTEGER;
lv_row_count INTEGER;
lv_current_date NVARCHAR(10);
BEGIN
DELETE FROM "<YOURSCHEMA>"."LOOPING_TARGET_TABLE";
-- Find logical partition looping values
date_list = SELECT DISTINCT "DATE" FROM "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE";
-- store dataset size in row_count variable
SELECT COUNT(*)
INTO lv_row_count
FROM :date_list;
FOR i IN 0 .. :lv_row_count-1 DO
  
    SELECT "DATE"
    INTO lv_current_date
    FROM :date_list
    LIMIT 1 OFFSET :i; -- notice OFFSET indexes at 0, not 1
    CALL "_SYS_BIC"."<YOUR_PACAKGE>/SP_WRITE_TARGET_TABLE" (lv_current_date);
END FOR;
--Manually initiate delta merge process
MERGE DELTA OF "<YOURSCHEMA>"."LOOPING_TARGET_TABLE";
END;
/********* End Procedure Script ************/





–Test full procedure and check results

CALL “_SYS_BIC”.”<YOUR_PACAKGE>/SP_LOOP_TEST”;

SELECT * FROM “<YOURSCHEMA>”.”LOOPING_TARGET_TABLE”;

DATE;CUSTOMER;MATERIAL;SALES_QTY;SALES_VALUE;CUSTOMER_SPECIALTY

20140101;0000112345;12345678;10;100;LB

20140102;0000112346;12345678;20;190;AB

20140103;0000112347;12345678;11;180;HS

20140104;0000112348;12345678;15;175;DM

20140105;0000112349;12345678;1;100;AX

20140106;0000112351;12345678;4;89;ZT

Happy HANA!

Justin

Assigned Tags

      42 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      Thanks for the reference!

      Author's profile photo Raj Kumar S
      Raj Kumar S

      Good one Justin.

      I could not understand the importance of LIMIT and OFFSET here?

      Author's profile photo Krishna Tangudu
      Krishna Tangudu

      Hi Raj,

      LIMIT i understand because he is trying to assign the value to a variable if the query results in more than one value it will result an error.

      So he is getting only value and assigning to the variable.

      But OFFSET is by default 0 ( guessing ) . Don't know if it is otherwise and hence justin is defining it as well. If default is not 0 i wonder what is the default value.

      Anyways expecting Justin to reply to our questions.

      Regards,

      Krishna Tangudu

      Author's profile photo Former Member
      Former Member

      Yes, as Justin mentioned in his comments LIMIT indexes at 0.

      It's used to iterate through the list of dates.

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      Yes, Jody already picked this one up (see how quick things move around here?).

      Because I am passing a scalar parameter to the second procedure that is actually doing the work as I iterate through the loop, I want to LIMIT the value in lv_current_date to 1 and also OFFSET by the position that I am at within the loop.

      This keeps moving through the list according to what dates are in variable date_list.

      Regards,

      Justin

      Author's profile photo Fernando Da Ros
      Fernando Da Ros

      Hi Justin,

      Nice doc.

      I've notice that you finish the execution starting the MERGE DELTA, but it's missing the information if this table has the SMART MERGE turned off. If not a merge can happens automatically during load.

      My 2 cents. Regards, Fernando Da Rós

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      Hi Fernando, thanks for the feedback. Don't you mean that these tables aren't configured for AUTO MERGE? If they were, the scenario you mention could occur.

      The DDL for the tables don't include an AUTO MERGE statement, therefore the mergedog process will not consider them for merging automatically. The system should only merge them when I issue a DB Hint via the SQL statement shown. Here, I don't use the SMART MERGE parameters, instead I am forcing a hard merge.

      As far as I know, there is only DDL for enabling auto merge, smart merge is a run time concept.

      Regards,

      Justin

      Author's profile photo Fernando Da Ros
      Fernando Da Ros

      Hmmm, As far as I know the default is smart merge ON.

      I didn't though about this, but other day I created a column table and inserted some millions of rows and the merge was made auto... I'll double check this to confirm (rev 70).

      Regards, Fernando Da Rós

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      - Auto Merge is set at a system level, either on or off.

      - By default, when a table is created through the GUI Auto Merge is enabled. This can be observed by taking a look at the DDL (via Export button). If it is enabled, you will see a suffix after the table definition of "AUTO MERGE"

      - Smart Merge is set at a system level, either on or off

      - For Smart Merge, there is no table level definition, only the configuration rules that determine if a merge is required. This only comes into play when an application "asks" for a merge with the MERGE DELTA syntax.

      - You can override the "Smart" part of the Smart Merge by including parameters 'FORCED MERGE = 'ON'.

      By far the best detailed doc I have seen is called "How To Delta Merge for SAP HANA", which covers all of this.

      Happy HANA,

      Justin

      Author's profile photo Fernando Da Ros
      Fernando Da Ros

      Hi Justin,

      I made a short test on revision 70 creating a column table and inserting without specify merge options and it got merged.

      create column table doc_test

      (doc    nvarchar(10)

      ,data   nvarchar(8)

      ,estab  nvarchar(4)

      ,filial nvarchar(4)

      ,vlnota decimal(12,2)

      ,primary key (doc));

      successfully executed in 10 ms 887 µs  (server processing time: 9 ms 777 µs) - Rows Affected: 0

      insert into doc_test

      select distinct * from shaddoc;

      successfully executed in 17.184 seconds  (server processing time: 14.061 seconds) - Rows Affected: 4000006

      select top 10 * from M_DELTA_MERGE_STATISTICS order by start_time desc;

      successfully executed in 32 ms 664 µs  (server processing time: 31 ms 315 µs)

      Fetched 10 row(s) in 0 ms 264 µs (server processing time: 0 ms 0 µs)

      /wp-content/uploads/2014/01/delta_merge_370682.png

      Taking a look on table it has auto_merge on, so it's the default. At least on my system.

      /wp-content/uploads/2014/01/default_370684.png

      Regards, Fernando Da Rós

      Author's profile photo Fernando Da Ros
      Fernando Da Ros

      Ahh, sorry I forgot to say that my intention was to say automatic merge but used wrong keyword.

      Thanks anyway

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      Well, just goes to show that you can learn something new every day 😉 . Although off topic from the blog post, this was insightful.

      I see you have already done some analysis around this topic in turn delta merge off.

      I performed a similar test on rev 69.01 and found that the behavior is the same as you have shown, by default all tables whether GUI or DDL have automerge enabled. You need to add specific "NO AUTO MERGE" syntax to the DDL.

      CREATE COLUMN TABLE "MOLJUS02"."TEST_TABLE_NO_AUTO"

      ("DATE" NVARCHAR(8) NOT NULL ,

      "CUSTOMER" NVARCHAR(10) DEFAULT '' NOT NULL );

      CREATE COLUMN TABLE "MOLJUS02"."TEST_TABLE_NO_AUTO_2"

      ("DATE" NVARCHAR(8) NOT NULL ,

      "CUSTOMER" NVARCHAR(10) DEFAULT '' NOT NULL )

      UNLOAD PRIORITY 5 NO AUTO MERGE;  

             

      SELECT TABLE_NAME, AUTO_MERGE_ON

      FROM TABLES

      WHERE TABLE_NAME LIKE 'TEST_TABLE_NO_AUTO%'


      Untitled.jpg


      Additional helpful links around these topics

      http://scn.sap.com/thread/3326556

      http://help.sap.com/hana/html/sql_create_table.html#create_table_auto_merge_option

      Regards,

      Justin

      Author's profile photo Fernando Da Ros
      Fernando Da Ros

      ... and these infos and links enrich the initial blog.

      Keep posting.

      Best regards, Fernando Da Rós

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Hi Justin,

      that's right - AUTO MERGE enabled is the default option for all tables.

      SMART MERGE is not a specific table property, but instead the option to control the point in time, when HANA is evaluating whether to merge a table or not.

      This is done by disabling the AUTO MERGE for the specific tables.

      Once this is done, an application like yours could issue a SMART MERGE command to indicate the point in time when merging could make sense from data loading point of view.

      HANA takes this smart merge command and uses the evaluation rules for the merge to figure out if a merge actually should be performed. This is the same approach that is taken with AUTO MERGE with the only difference that the application can trigger when the evaluation should take place.

      Also, it's possible to provide a different evaluation formula for the SMART MERGE to accommodate the likely different data loading patterns for the tables.

      Anyhow, in general SMART MERGE is to be used by SAP BW on HANA mostly.

      What I would recommend to speed up in your case is to disable the persistent merge for this table during your data load and enable it after all loading took place.

      This will prevent large savepoints during the loading process.

      - Lars

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      Thanks for this Lars. In your last statement, did you mean disable "automerge" instead of "persistent merge"?

      I am in the process of starting to convert tables that are populated via SP's to disable automerge and control through the application. Would you say that would be a significant performance increase due to the absence of savepoints?

      Regards,

      Justin

      Author's profile photo Lars Breddemann
      Lars Breddemann

      And another comment to this nice piece of implementation documentation:

      Using LIMIT... OFFSET to move a 'window' to the records is a practical way to work on the data piece wise.

      But since you don't lock the table, data might be changed in between - leading to wrong results for your join operation.

      If you are sure that nobody ever will change the table while you are running the join, the lock table won't even lead to any locking situations. So it cannot do any harm in this situation.

      Also, the current implementation basically limits itself to sequential processing of the chunks - irrespective of the actual memory situation in the system.

      By first determine the set of record-ranges to work on and then write a procedure that simply grabs one of the "free" ranges you could easily run multiple part-joins in parallel and speed up the process in case more system resources are available.

      Thanks for sharing this design and the documentation!

      Cheers, Lars

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      Lars, first off - thanks for the kind feedback. Second - I would welcome some hints on actual implementation on both of your suggestions. Thanks for your time!

      Locking - I understand if we lock some tables before processing, that no other read/write operations can be performed. If I lock with a SP, I assume it's released once processing is completed and the other applications that need to write can now continue?

      Parallelized processing - I understand what you are saying and I had thought about how to do so, but never got into detail. Can you shed some light, maybe some syntax on how I can achieve what you are suggesting? How can I execute multiple procedures at once?

      Regards,

      Justin

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Hi there 🙂

      The locks will be released after a COMMIT or a ROLLBACK. Remember that a SQLScript procedure doesn't open its own transaction, but is part of the calling transaction.

      So in your scenario this would mean:

      1. Lock the table
      2. call your procedure
      3. Commit/Rollback (in case of error)

      Concerning the parallel option:

      Once the table is locked you know that you deal with a fixed amount of records.

      That means you could fill a helper table with these chunks first and set a flag on them that indicate whether these chunks (or dates in your case) are currently in work or not.

      If you then run a second instance of your procedure in parallel it could simply take the next unprocessed date (or chunk) and work on this.

      While I write this I realize that this re-introduces the locking problem then - so this doesn't guarantee a consistent join then 🙁

      Sorry - back to square one in this case....

      Cheers, Lars

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      Ok fair enough, thanks for the reply.

      One last bit that I was most interested in - how would I physically call two instances of the same procedure in parallel? From what I know, within SQLScript, the statements run sequentially, ie if I make a call to another procedure, it will wait for that call to complete before moving to the next piece of work (either a new statement or another instance from the loop).

      So how can you parallelize any type of statement within a given procedure in general?

      Regards,

      Justin

      Author's profile photo Lars Breddemann
      Lars Breddemann

      hehehe - that's exactly the point: you cannot have out of order or parallel procedure calls in a HANA transaction.

      Exactly this is the missing link for the chunk wise parallelisation of such a procedure.

      However, this does not mean, that only one thread works on your commands here.

      Even when you can only call one command/procedure at any time at once, HANA performs a lot of parallelisation on lower levels.

      Multiple columns are read in parallel, reading/writing data can be done in parallel. Depending on the statement in question, aggregation and joining can be done at the same time.

      All this happens automatically and doesn't require any application code.

      Within SQLScript (at least for read only procs) independent statements are automatically run in parallel - again, without any coding requirements.

      Cheers, Lars

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      So what you are saying, is that my best hope for the fastest write SP need to have

      1) Auto Merge disable, use application level merging

      2) Process data in manageable logical chunks rather than one huge transaction

      3) Make code as efficient as possible

      Regards,

      Justin

      Author's profile photo Krishna Tangudu
      Krishna Tangudu

      Just Adding to your points Justin,

      I had an experience of writing one Script based calculation view in which when we wrote the generic query i.e Select <column-list> from Table we are getting a performance around 2-3 seconds and when changed the query to

      select <column-list> from Table Where Region = 'AMR'

      UNION

      select <column-list> from Table Where Region = 'APAC'

      UNION

      select <column-list> from Table Where Region = 'EMEA'

      ( we know we have only 3 regions data ) this query was performing better.

      I agree that if any new region is added we have to again change this View. But as we had an SLA to get performance of all the View under < 1 second and we have tried everyother possible option we know like ( Partitioning , Pushing down the filter ) we went with this version of the View . 🙂

      Regards,

      Krishna Tangudu

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Not quite 🙂

      The thing is: for each call of the actual join some internal processing will be repeated (e.g. building another section of the translation table).

      So, best (performance wise) would be to run the join at once...

      Anyhow, it turns out that it actually pays off to sometimes get up from the desk and get another coffee... I just figured out that my previous statement was not correct and that you indeed could run multiple part-joins in different transactions in parallel.

      See:

      First we lock all source tables in the starting transaction and keep these locks.

      However, we don't change any of the source data, so for now the source transaction can go to sleep and just wait.

      Next, the processing blocks are determined and written into the block tracking helper table.

      Next, the actual join procedure is called from a different session/transaction. It selects the first free entry from the helper table (select for update) and marks it as "in process". COMMIT now.

      Now, the actual join starts with the INSERT command into the target table.

      While this is running you can start any number of additional procedures. Each one will fetch one of the free blocks and work on it.

      Finally after the INSERT statement has finished, the helper table needs to be updated again (status "finished" or "finished after ... secs").

      Once all chunks have been finished, the table locks on the source tables can be released via ROLLBACK/COMMIT.

      If you want/need to manually deal with the DELTA MERGE for the target table then can be considered separately then.

      Hope that makes sense,

      Cheers, Lars

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      Lars, I have some feedback from some testing I performed today. Essentially my goal was to see how performance changed across a few scenarios with regard to when the merges were triggered (as compared to auto merge). This is a follow up to your comment on manually controlling merges may avoid savepoints, logs etc and be faster to execute.

      I didn't change anything with regard to how I was looping over the data per above or any of the logic. In the real client scenario, the output was about 18m records in the target table.

      Since the COMMIT; statement is not allowed in a stored procedure, to get the manual merge operations to work as intended, I need to use the workaround as explained in this post, only I just used it to commit data after the large insert.

      SAP HANA: Workaround for using Commit & Rollback in exception handling block of a Stored Procedure

      For example -

      var_commit VARCHAR(100) := 'COMMIT';

      BEGIN

      <LOGIC/LOOP HERE>

      EXEC(:var_commit);

      MERGE DELTA OF TABLE XXX;

      END;

      I used the same procedure copied three times and modified per below. All scenarios execute a DELETE FROM, and INSERT operation by looping over multiple sets of data, here are the differentiating factors.

      1) Target table is set to automerge, no manual merging.

      2) Target table has auto merge disabled, one commit and delta merge executed at the end of all processing.

      3) Target table has auto merge disabled, one commit and merge after the DELETE FROM statement, then a commit and merge after each loop execution that contains the INSERT statements.

      My theory was that the third scenario may be faster, since the merge was being performed on smaller sets of data. At minimum, I expected the automerge to be slowest. Here are the runtimes as averages of multiple executions.

      1) 4m50s

      2) 5m25s

      3) 7m16s

      So I was a little surprised to see that the scenario with auto merge enabled was actually the fastest! This tells me there is little to gain from processing times by precisely controlling the merge process during table updates, very interesting conclusion and a little counter-intuitive.

      Regards,

      Justin

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      The only thing I forgot to consider around scenario 1 is that the merge time is not included with that execution. To get true apples-to-apples run time, I need to factor in the "EXECUTION_TIME" as seen in table "SYS"."M_DELTA_MERGE_STATISTICS", for which I observed a total of 28s.

      This essentially puts scenarios 1 and 2 on equal footing, same conclusion however.

      Regards,

      Justin

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Hey Justin,

      the point here is: as long as the merge is triggered by the mergedog (that is what happens when you rely on AUTOMERGE) then merging happens asynchronously to your SQL code.

      In runs in the background.

      When you trigger it from your code it waits for the merge to finish. It runs synchronously.

      It's doing the same job though, so the merge operation itself is not going to be faster either way.

      The only way to actually make the merge faster is to trick: to disable writing of the merged result data structure to disk. So, if you know or expect in advance that during your data load the merge will occur several times, you could disable persisted merge at the beginning and reactivate it after you're done with loading data.

      Also: there is no need to commit before running a merge.

      The merge operation is not subject to the record locks (would be bad otherwise) and retains all still required record versions even over a merge operation.

      - Lars

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      Lars, thanks for entertaining this chase down a rabbit hole.. 🙂

      The statements around asnc and sync merge execution make sense. I will take a look at the persisted merge stuff for performance, but as for you last comment

      Also: there is no need to commit before running a merge.

      I have found the commit to be required, and I actually have an open OSS message (93666/2014) around the same. I have observed that if a commit is not issued before the merge starts, then no merge actually occurs on the newly inserted data, only on the previously committed data.

      If I have time, I'll give you an example to prove out, but just wanted to let you know I found this untrue in the context of a SQLScript stored procedure.

      Regards,

      Justin

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Hi again,

      did test it again and you're right 🙂

      Any update/insert/delete statement will acquire a IX (intentional exclusive) lock on the table - which is perfectly correct.

      This kind of lock allows other IX locks to be active, but no X (exclusive) lock is possible.

      And for the delta merge an exclusive table lock is required for a very brief period of time - when the current delta store is closed and the new delta2 store is established to take care of new data changes.

      So this is not a bug as such.

      It's not even really a design flaw since the internal structure maintenance is not something that you should try to control in your application code.

      Actually you gave the best proof for that: without including the code for the delta merge, your code was smaller (and thereby easier to understand and maintain) and ran faster.

      HANA handled the required merging automatically in the background - with nobody waiting for it.

      Thanks for pointing out this one again.

      - Lars

      Author's profile photo Krishna Tangudu
      Krishna Tangudu

      Hi Lars & Justin,

      Was running through the comments and i was getting confused. So would want your help to summarize the results of the tests.

      -- Is Automatic Merge is better then manually triggering delta merge command?

      Regards,

      Krishna Tangudu

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Hey Krishna

      I was hoping that by now it would be clear that the answer is:

      it depends.

      If one option would always be correct and better and the other option always crap, well, there wouldn't be an other option then, would it?

      As a general rule of thumb: don't try to outsmart the automerge process. Don't do anything about it. Just let it do its job and design your application as if you wouldn't know about it.

      Most of the time, this is the optimal strategy.

      In case you actually realize that

      a) your data loading performance can be massively improved by triggering merges manually

      and

      b) that the additional development, maintenance and monitoring effort is outweigh by the increase of performance

      an

      c) you actually fully understand the implications of your own delta merge process management

      then you might yield better results by manually controlling the merge process (just as SAP BW does).

      - Lars

      Author's profile photo Krishna Tangudu
      Krishna Tangudu

      Thanks for the reply Lars.

      Now i understand the whole complete discussion.

      I was rather trying to search if results showed that one of the process is better. Now i understand that "it depends".

      While preparing the Generic SP to load the data into SAP HANA. I think the above discussions will help a lot while designing based on the requirement.

      Regards,

      Krishna Tangudu

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      We tend to be a little long winded, apologies 😉

      I think the guiding principle in short is - leave automerge on unless you see some obvious need to control it manually. Even with massive full drop and reload data processing of 200m records, I saw no gains in manually controlling - so it's was of no value to do so.

      Regards,

      Justin

      Author's profile photo Krishna Tangudu
      Krishna Tangudu

      Thanks Justin for your reply.

      Was actually preparing the below document on preparing a "Generic" SP which we can use to load data between 2 tables.

      SAP HANA: Generic Procedure using Arrays to Load delta data into Tables

      Initially started to write to understand how to escape the cursors and use arrays to form the dynamic queries.

      So while writing that i got some doubts in the areas of "merging" which made me land into this page.

      I intended to make that document to create a best "Feasible" loading "Procedure" to load the data

      I understand now as @Lars mentioned .. "it depends" 🙂

      Regards,

      Krishna Tangudu

      Author's profile photo Former Member
      Former Member

      Hi Justin,

      The call to the procedure "SP_WRITE_TARGET_TABLE" (shown below) within the FOR loop makes the wrapper procedure perform bad.The better approach would be to optimize the procedure call  by changing the procedure , "SP_WRITE_TARGET_TABLE" to accept a table type or an array as the input parameter.

      1. FOR i IN 0 .. :lv_row_count-1 DO 
      2.      
      3.     SELECT "DATE" 
      4.     INTO lv_current_date 
      5.     FROM :date_list 
      6.     LIMIT 1 OFFSET :i; -- notice OFFSET indexes at 0, not 1 
      7.     CALL "_SYS_BIC"."<YOUR_PACAKGE>/SP_WRITE_TARGET_TABLE" (lv_current_date); 
      8. END FOR

      Thanks,

      Goutham

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      Hi Goutham, thanks for the feedback.

      Are you saying that a procedure that accepts a scalar variable as a parameter is faster than one that accepts a table type as a parameter?

      If so, maybe you can help illustrate by working up an example and showing the runtimes.

      Regards,

      Justin

      Author's profile photo Former Member
      Former Member

      Hi Justin,

      A stored procedure call within a loop is a time intensive operation.

      In the procedure, sp_loop_test, the stored procedure sp_write_target_table is been invoked in a loop for each DATE selected from date_list.This is a very time intensive operation with complexity proportional to the number of dates selected.

      Instead, an optimal approach would be to call the procedure sp_write_target_table only once from sp_loop_test by passing  date_list as IN parameter of type table type.

      So, for doing this, sp_write_target_table has to be modified to take a table type as input parameter and the procedure logic has to modified as shown below:

      1. --<YOUR_PACAKGE>.SP_WRITE_TARGET_TABLE 
      2. /********* Begin Procedure Script ************/ 
      3. BEGIN 
      4. INSERT INTO "<YOURSCHEMA>"."LOOPING_TARGET_TABLE" 
      5. SELECT A."DATE", A."CUSTOMER", A."MATERIAL", A."SALES_QTY", A."SALES_VALUE", B."CUSTOMER_SPECIALTY" 
      6. FROM "<YOURSCHEMA>"."LOOPING_DRIVER_TABLE"
      7. LEFT OUTER JOIN 
      8. "<YOURSCHEMA>"."LOOPING_LOOKUP_TABLE"
      9. ON (A."CUSTOMER" = B."CUSTOMER"
      10. WHERE A."DATE" IN DATE_LIST; 
      11. END
      12. /********* End Procedure Script ************/ 

      I will work on the optimization and will let you know the results.

      Thanks,

      Goutham

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      >>In the procedure, sp_loop_test, the stored procedure sp_write_target_table is been invoked in a loop for each DATE selected from date_list.<<

      That's actually the intended behavior 🙂

      >>Instead, an optimal approach would be to call the procedure sp_write_target_table only once from sp_loop_test by passing  date_list as IN parameter of type table type.<<

      Ok, now I see what you are saying. It probably depends on your situation honestly. For an ETL process, the speed of completion sometimes is not so important.

      The approach originally mentioned was primarily meant to reduce the memory required at any given point in time by processing the data in smaller pieces. This addresses some of the OOM issues that can happen with large resultsets.

      What you show here is nothing but performing the full worklist of dates in one shot. There is actually no reason to encapsulate this call in a loop at all, we could just derive the worklist and execute the insert within the same procedure.

      At the end of the day, this is just meant to illustrate what is possible. In some other cases of irregular transposing, ragged hierarchies, etc this may become even more useful from a functional perspective.

      Definitely would like to see the results of your performance test however.

      Regards,

      Justin

      Author's profile photo Former Member
      Former Member

      Hi Justin,

      Can you please let me know why the tables are created without a primary key in this example?

      Thanks,

      Goutham

      Author's profile photo Justin Molenaur
      Justin Molenaur
      Blog Post Author

      Well, after some intense analysis of the source data (13 records via insert statements), it was determined that there is no possibility for duplication of data. Therefore, for demonstration purposes this seems fine, see any issue?

      Regards,

      Justin

      Author's profile photo Former Member
      Former Member

      Good post.

      OOM its really the main HANA showstopper.

      HANA developers should build solution when OOM exclude even theoretically.

      Author's profile photo Biswamit Dhal
      Biswamit Dhal

      INTEGER datatype is not working in SQL console.

      It is showing syntax issue.

      Please help me on this.

      Author's profile photo Nagaraj D
      Nagaraj D

      Hi,
      When we try loop the below procedure (SP_WRITE_TARGET_TABLE), the memory used by this loop procedure during processing of each loop keeps on building in the wrapper procedure ( SP_LOOP_TEST). Is there a way we can overcome this ?.

      I did create local temporary tables in looping procedure and used DELETE from with a conditio ( Knowing teh fact that DROP will be ineffective in a SQLScript)

      Thanks in advance!