Based on SAP HANA SPS 12 (HCP trial)

When Jody Hesch recently send me an email with a question about HANA, he received what everyone receives when sending email questions about HANA. The reply is a version of

“Usually I don’t do email Q&A as this simply doesn’t help with knowledge sharing.

Instead, I advise everyone to post the question in one of the HANA related forums
(SAP Community https://answers.sap.com/questions/metadata/23925/sap-hana.html ,
JAM https://jam4.sapjam.com/groups/about_page/6UHzR2Fxra4quFAbACtxFD
or even stackoverflow http://stackoverflow.com/questions/tagged/hana ) so that the question and its answers are search- and findable.

That way everyone can benefit from this and you even might get faster and/or better answers than from just writing to me.”

I have not given up on my personal fight against GEIGOKAI and this it what it looks like in daily real-life.

RUDE!

You might be inclined to call this reply rude but it turns out that most people accept this very easily and manage to copy&paste their question into one of the forums – and changing culture is always a bit painful … *drip* *drip* *drip*

And just as many others, Jody managed to post his question for all to see, read and discuss here: Manually managing Delta Merge

While I was doing the due diligence of checking my own ideas against a HCP trial HANA instance (SPS 12.04), the whole explanation and the example got larger than what would be nice for a simple answer, which is why you now read this is a blog post format.

On loading data and merging

Ok, let’s get to it, then.
Jody’s question was “how can manually handling delta merge make a data load faster?”
To answer this I set up a small test case:

drop table merge_test;
CREATE COLUMN TABLE  "MERGE_TEST" ("ID" INTEGER  ,
     "AAA" NVARCHAR(20),
     "BBB" INTEGER ,
     PRIMARY KEY ("ID"));

do
begin
declare ts_start timestamp = current_timestamp;
declare ts_end timestamp;
declare loops integer = 100;
declare j integer;
declare i integer;
declare with_commit integer = 1;

    truncate table merge_test;
    
    for j in 0 .. :loops do
        for i in 0 .. :loops*50 do
            
            upsert merge_test 
                values (:i, :i || ' - ' || :j, :i*:j)
                with primary key;

        end for;
        
        if with_commit = 1 then
            commit;
        end if;
        
    end for;

    ts_end = current_timestamp;    
 
    -- now show some infos
    select seconds_between(:ts_start, :ts_end) as duration from dummy;
    
    select count(*) as num_rows, min(id), max(id) from merge_test;
    
    select column_name, memory_size_in_main, memory_size_in_delta, count, distinct_count
    from m_cs_all_columns 
    where table_name ='MERGE_TEST'
    and schema_name =current_user;
    
    select merge_count, read_count, write_count, last_merge_time
    from m_cs_tables
    where table_name ='MERGE_TEST'
    and schema_name =current_user;
    
    select start_time, motivation, execution_time, memory_merge, merged_delta_records, success
    from m_delta_merge_statistics
    where table_name='MERGE_TEST'
    and start_time >= :ts_start
    and schema_name = current_user;
    
end;
 

What we have here is a simple table and a data loading simulation. The data gets inserted or updated via the UPSERT command and, if the parameter with_commit = 1 a commit is done after each iteration of the outer loop.
After both loops have been left, the code prints the runtime in seconds and selects some statistics from system views.

Next, I ran three experiments.

The first two used the SAP HANA AUTOMERGE feature, where HANA decides when to perform a delta merge.
The third one followed the approach SAP BW uses for data loading: the AUTOMERGE is disabled and the target tables get merged after all data has been loaded.

No commit, AUTOMERGE ON

 

/*
DURATION
386     

NUM_ROWS    MIN(ID) MAX(ID)
5001        0       5000   

COLUMN_NAME MEMORY_SIZE_IN_MAIN MEMORY_SIZE_IN_DELTA    COUNT   DISTINCT_COUNT
ID          1064                3733044                 505101  5001          
AAA         0                   17607386                505101  -1            
BBB         0                   5571068                 505101  -1            
$trex_udiv$ 232                 8889016                 0       0             
$rowid$     768                 15676428                505101  505101        

MERGE_COUNT READ_COUNT  WRITE_COUNT LAST_MERGE_TIME        
0           7           505101      2016-12-20 03:15:41.986

START_TIME  MOTIVATION  EXECUTION_TIME  MEMORY_MERGE    MERGED_DELTA_RECORDS    SUCCESS
*/

Easy to see, the whole thing ran for 386 seconds and managed to create a total of 505.101 record versions in the delta store.
Note that during this load NO delta merge had happened. This is because we never committed the transaction at all and the delta merge won’t “relocate” open update transactions to a new delta store.

Commit, AUTOMERGE ON

/*
DURATION
349     

NUM_ROWS    MIN(ID) MAX(ID)
5001        0       5000   

COLUMN_NAME MEMORY_SIZE_IN_MAIN MEMORY_SIZE_IN_DELTA    COUNT   DISTINCT_COUNT
ID          723716              9132                    5001    5001          
AAA         741076              10052                   5001    5001          
BBB         711828              9132                    5001    5001          
$trex_udiv$ 273432              1912                    0       0             
$rowid$     1504                9132                    5001    5001          

MERGE_COUNT READ_COUNT  WRITE_COUNT LAST_MERGE_TIME       
5           30          505111      2016-12-20 03:40:22.85

START_TIME              MOTIVATION  EXECUTION_TIME  MEMORY_MERGE    MERGED_DELTA_RECORDS    SUCCESS
2016-12-20 03:36:22.166 AUTO        54              FALSE           80016                   TRUE   
2016-12-20 03:37:22.24  AUTO        88              FALSE           90018                   TRUE   
2016-12-20 03:38:22.349 AUTO        119             FALSE           85017                   TRUE   
2016-12-20 03:39:22.49  AUTO        157             FALSE           85017                   TRUE   
2016-12-20 03:40:22.67  AUTO        186             FALSE           85017                   TRUE   
*/   

Here we see that with commits in between the whole thing only took 349 seconds.
As we’ve committed the transaction in between a couple of times, the automerge had a chance to be active – and did so roughly every minute (5 times).

Now, why is this variant a bit faster overall? The effort to find the currently active record versions grows with the number of versions in the delta store.
With the automerge in between, this number stayed consistently below 100.000 versions, while the first version had to face an increasing number of versions to check between round 100.000 and 505.101.

Commit, AUTOMERGE OFF

Finally the version of data loading that is similar to SAP BW’s approach.
I disabled the automerge via

alter table merge_test disable automerge;

and ran the test again:

/*
DURATION
325     

NUM_ROWS    MIN(ID) MAX(ID)
5001        0       5000   

COLUMN_NAME MEMORY_SIZE_IN_MAIN MEMORY_SIZE_IN_DELTA    COUNT   DISTINCT_COUNT
ID          1064                3733044                 505101  5001          
AAA         0                   17607386                505101  -1            
BBB         0                   5571068                 505101  -1            
$trex_udiv$ 232                 8810536                 0       0             
$rowid$     768                 15676428                505101  505101        

MERGE_COUNT READ_COUNT  WRITE_COUNT LAST_MERGE_TIME        
0           15          505101      2016-12-20 03:49:36.914

START_TIME  MOTIVATION  EXECUTION_TIME  MEMORY_MERGE    MERGED_DELTA_RECORDS    SUCCESS

*/

merge delta of merge_test;

/*
START_TIME              MOTIVATION  EXECUTION_TIME  MEMORY_MERGE    MERGED_DELTA_RECORDS    SUCCESS
2016-12-20 03:56:09.435 HARD        46              FALSE           5001                    TRUE   
*/

With 325 seconds this is the fastest run.
Looking at the time the manual delta merge took, we find that it’s still faster than the last delta merge of the automerge example.

This is – again – due to the fact that now the are a lot less concurrent record versions in the table.
Note, MERGED_DELTA_RECORDS shows the number of valid (committed) records that have been moved from delta to main storage.
If I update the same record a hundred times before I commit, then I would only get one record merged into the main store.

Now, why do we see a better performance in this case?

To illustrate the resource usage during our loads, see the following diagrams:

In our first case, we see that the memory usage for the delta store as well as the CPU usage for the UPSERT gets larger over time.
Sometime after the last commit is done, an automerge gets triggered, leading to additional CPU and a lot more memory usage.
The result is a relatively well-compressed table (last little blue square).

Note that the number of uncomitted record versions (little orange squares) increases during the whole time.


With the second approach (commits in between and automerge on), we see that CPU and memory usage during the load stay moderate. Only when the merge is performed, memory and CPU usage increase and the whole process has to wait for the merge to switch over to the delta2 in order to continue.

Noteworthy here is that the resulting table can easily be larger than the table sizes produced by the other methods. This is because the column compression algorithms are determined during the first delta merge and won’t change that easily.
This can lead to a situation where the compression algorithms are determined on a subset of data that is not representative of the total (think of loading data with timestamps or dates in sequential order – the first merge may only see one specific date).

With the last option, we see a very similar pattern than in the first case.
The only difference here is that the final merge was manually triggered.

So with AUTOMERGE disabled and regular commits we get the best of everything:

  • no waits due to delta merges
  • no increased runtimes due to a high number of outdated uncommitted record versions
  • well-compressed table (since the compression optimisation can work on the whole data set).

This works well, as long as the resulting delta merge can be done within the available memory.
And even if other memory structures would need to be unloaded to allow for the delta merge to happen, then this would have to happen only once and not several times during the loading process.

Smarter merge

I mentioned that this third approach is similar to what SAP BW does but this is only half the truth.
Instead of manually sending a

merge delta of merge_test;

which gives the hard instruction to perform a delta merge now, SAP BW uses a “smart merge”.
The idea here is that instead of forcing a delta merge to let HANA evaluate whether a merge would be granted for, given the current memory statistics and given the merge decision functions.

With this flexible approach, the application does not need to care about the specifics of how to decide when a merge should take place, but can yet trigger HANA to take care about that.

TL;DR

Manually controlling the delta merge can improve performance for the load process as well as reduce the overall system load during the data loading, since multiple merges are avoided. Delta merges still are extremely resource intensive operations that can have a massive impact to the whole HANA system. Therefore it’s a good idea to have them done as seldom as possible.
This, in combination with a controlled approached to COMMITTING loaded data can, as shown above, have quite an effect to the runtime of the data loading.

There you go, now you know.
Lars

To report this post you need to login first.

9 Comments

You must be Logged on to comment or reply to a post.

  1. Jody Hesch

    Many thanks Lars. Rude, not at all. As per your email, “Well, how about you post that question in one of the forums and I’ll answer it there, OK? I think the question is quite valid and probably interesting to others, too.” Makes perfect sense to me! Anyone who finds that rude… is something of a Grinch around this holiday season. 🙂

    Appreciate the worked-through example above. Definitely helps my understanding.

    (0) 
  2. Rajarshi Muhuri

    Lars:

    couple of years back , I had to write a SP that had some loops and write a million+  records . I had played with turning auto commit off and manual delta merge off . But one of your suggestions had been to write to a temp table and then push to the persistent table in one go for max performance .

    (0) 
    1. Lars Breddemann Post author

      Well, I hope I had good reasons for that back then :).
      Jody was asking for a general explanation on how automerge could affect data loading which is what this blog is about.

      (1) 
  3. Christoph Gilde

    Very well explained, Lars!

    Maybe worth noticing that the BW approach with sending a smart merge hint can lead to really bad performance in some cases.

    I had a customer a while ago that migrated BW to HANA, 7.4 something at that time. After the migration some interfaces writing to DSOs for direct update using the APIs performed really bad and caused a high CPU load on the HANA server whenever the interface reports were scheduled. Dont want to go into the details of the use case…

    I analyzed the issue and it turned out, that it had to do with the smart merge trigger:

    Those APIs have an optional parameter i_smartmerge which defaults to true. Because this Parameter did not exist when the interface reports were written, a smart merge hint was sent to HANA everytime a record was inserted. In combination with the cost function for smart merge (mergedog param smart_merge_decision_func) and because the table was consuming less than 1GB in main, it was merging basically all the time! We rewrote the report to send a smart merge hint only after a batch of records had been inserted. Automerge would have been the better option, but  I did not want to mess with the BW generated tables.

    Some details can be found at http://www.sap.com/docs/download/2012/05/9ec46a16-547c-0010-82c7-eda71af511fa.pdf (slightly outdated but most should be still valid).

     

    (1) 
  4. Todor Botev

    Hi Lars,

    many thanks for this great analysis and explanation!

     

    It looks like the main factor causing the bad performance of the first approach is the high number of outdated uncommitted record versions, because “the effort to find the currently active record versions grows with the number of versions in the delta store.”

     

    And here is where I have a question: why leave those outdated version in place – and why look again and again through all of them in order to find the active one?

     

    I can think of two strategies which look better to me:

     

    1. Just put the new record in the delta store; do not look for previous versions. Let the merge do the work for finding out the last version to be merged.

     

    2. Look for previous versions but delete them instead of just outdating them. This way the effort for finding the previous versions will be kept constant. I cannot think of any reason why an outdated version can be useful.

     

    I suppose there is a reason for the current behaviour – but just cannot find any. Do you have an idea?

    Regards
    Todor

     

    (0) 
    1. Lars Breddemann Post author

      “why leave those outdated version in place – and why look again and again through all of them in order to find the active one?”

       

      These are reasonable questions, I think. The answer to them is that HANA’s column store doesn’t do inline updates, but only appends data. This technique provides many benefits, as you don’t have to “squeeze” in new records in an existing storage place, the old record is still available for other transactions to read (that’s MVCC without a separate UNDO copy of the data for you) and it also reduces locking contention for insert/update/deletes.

      Maybe important to note for this blog is that the question is around mass data loading. That’s one scenario that the column store needs to support. But is that the only or even the main scenario? Probably not, as the system is used for transaction oriented scenarios too. In such a scenario, many small transactions hit the column store and you really don’t want those the be burdened with any online reorganisation – which is why the delta merge happens asynchronously and typically without impacting the OLTP.

      Now, why don’t we just ignore the deleted records? Because we have to know which records to ignore first. That’s pretty much the same in most DBMS that support MVCC. Instead of simply keeping only committed records at all, at any time (this would require read-locking!) HANA just marks records as deleted as of a certain transaction number. This allows a flexible view onto the data for all transactions currently reading the same table.

      Looking at your suggestions:

      1. The delta store of course only ever receives new record versions. But these record versions might have been committed, rolled back or superseded at the point in time when another transaction comes and reads the table. In order to present the actual current committed version, HANA has to do the due diligence and read the main store for all matching records and the delta store for all matching records and only allow the current records to be seen.
        The actual cleanup actually is done by the delta merge. Means: as soon as there cannot be any transaction anymore that could see an outdated record version for any reason, HANA gets rid of this version.
      2. How do you want to provide MVCC then? Or are you a huge fan of exclusive reading-locks?

      Hope that helps.

      Cheers,

      Lars

      (0) 
  5. Todor Botev

    Hi Lars,

    thanks a lot for your explanations! Let me add some more detail to my suggestion 2: remove the outdated versions. I now understand that I didn’t describe the idea correctly:

     

    Look for previous versions and delete them *if they are still not commited*.

     

    Your first test (No commit, AUTOMERGE ON) is mainly about uncommited outdated records slowing down the process.

     

    The MVCC (Multiversion Concurency Control) does not play any role here: no other session would ever see those versions because they will never be commited and thus exposed to the world.

     

    So it comes down to recognizing early on that the are uncommited and throwing them out of the delta store. The merge does it – so it should be possible.

     

    Regards
    Todor

     

    (0) 
    1. Lars Breddemann Post author

      Of course. it is possible to find the currently valid (and thereby the invalid records) also for in-flight transactions. That’s what any SELECT implicitly does when run within a transaction.

      Now, why wouldn’t you want to permanently go and reorganise the delta store for this in-flight transaction? Because that’s very expensive and involves locking. Big time.

      ALL transactions working on this data structure need to stop. Not nice in a system that should enable concurrent access.

      Also, you seem to think that “deleting the uncomitted but superseeded records” would in any way make the delta store smaller. That’s not the case – remember that the delta store is an add-only data structure. There are no in-place deletions/changes possible. The delta merge doesn’t change this – it merely creates a new main store, a new delta store and then drops the old delta store.

       

      (0) 

Leave a Reply