Quite often, I have discussions with people on how HANA handles memory management. Robert Klopp and I got into this in some detail, so I thought I’d write a document giving the facts.

It is often thought that HANA is an in-memory database, only. However this is not accurate: SAP HANA persists all data to disk. Once it is persisted, upon startup, HANA (by default, this is configurable) loads data on request. It loads it based on the partition-column that is required. Let’s take an example to understand this, because this is subtle.

Take the following table TRANSACTION. It has 7 columns including AMOUNT, TIMESTAMP and CUSTOMER_ID. It is partitioned by HASH(CUSTOMER_ID) and then RANGE(TIMESTAMP), by year. This is a total of 4×16=64 partitions, or 448 column-partitions.

Screen Shot 2013-12-05 at 3.52.13 PM.png

At startup, HANA won’t load table TRANSACTION and you can see the memory consumption is zero (well, 114kb). Just to be sure, I issue:

UNLOAD TRANSACTION;

Now, let’s go ahead and check that in table SYS.M_CS_COLUMNS, which holds all the information about columns in-memory and out:

Screen Shot 2013-12-05 at 4.04.05 PM.png

That’s what we expect. So I’m going to go ahead and run a query which will definitely only hit a single partition-column. This should only load 2012 data for column TXAMOUNT.

Screen Shot 2013-12-05 at 4.08.15 PM.png

Lets see what columns have now been loaded:

Screen Shot 2013-12-05 at 4.08.39 PM.png

As expected, there are 8 columns – partitions 13 and 16, which are for the year 2012 and “Other”, respectively. Now let’s check how much data is used:

Screen Shot 2013-12-05 at 4.09.04 PM.png

Let’s run LOAD TRANSACTION ALL – to load all the table, to see how big it is:

Screen Shot 2013-12-05 at 4.14.20 PM.png

Yes – asking this question only required 5% of the data to be loaded into memory. Note the column TXTIMESTAMP is a very big column – in many cases it would need much less than 5%.

What happens when HANA runs out of memory?

When HANA hits 95% memory, it starts to unload partition-columns that were least recently used. You can find them in the unloads trace file.

Screen Shot 2013-12-05 at 4.18.12 PM.png

You can see that it unloads specific partition-columns (60,61 are the partition numbers, the column is at the end) based on those that were last used.

Final Words

I hope this helps clarify exactly how HANA behaves around loading data into memory. In many cases, it is possible to have more data than you have RAM in your HANA environment.

But, remember, the working set of partition-columns must take no more than around 50% of your overall RAM because HANA needs memory in which to do calculations. HANA is not designed to load and unload data continually and it is relatively inefficient at this compared to a row-based RDBMS – instead it is optimized for very fast performance in-memory. If you have a situation where your working set – the data on which you want to operate – is larger than your available memory, then performance will tank.

But, if you load data, schemas, tables, partitions and columns which you don’t use into HANA, they don’t need to be loaded into memory.

To report this post you need to login first.

17 Comments

You must be Logged on to comment or reply to a post.

  1. Viren P Devi

    Thanks John for the details. I have quick question though,

    If my understanding is right data is loaded in ram/memory when requested may by reporting tool. Also data insert/delete is not as good as row based like ECC system.

    Considering these facts should we expect high data latency for first data request (selected big data set) compared to subsequent requests?

    Correct me if I am wrong please.

    Regards,

    Viren

    (0) 
    1. John Appleby Post author

      If the data is not in memory then latency is high. This is why I say that you always need your working set it memory.

      Data insert and delete is at least as good as a row-based database. HANA is in-memory and so it is efficient even for insert and delete, despite being columnar.

      When you take into account not needing to have additional indexes, aggregates or separate reporting systems then HANA is even more efficient.

      (0) 
      1. Viren P Devi

        Right. One more thing I wanted to ask from long time,

        If it is columner database how much performance degradation database may experience while inserting the records (for selected number of columns and rows) compared to RDBMS. I think compression doesn’t come into picture while inserting data as it is INSERT only job?

        Thanks again,

        Viren

        (0) 
        1. John Appleby Post author

          I’ve compared HANA to other databases and it depends. HANA performs around the same as a row-based traditional RDBMS for bulk inserts, when the row-based RDBMS has no indexes or constraints or automatic roll-ups.

          However HANA performs far better than other databases when they need indexes, primary key constraints or roll-ups: often 100x better. That’s the beauty of HANA.

          For very high-frequency individual inserts, you may find a row-based database can outperform HANA SP07 in some circumstances. They are looking at mechanisms in later support packs to improve this specific case.

          On insert, HANA puts the data into a delta store which isn’t compressed. Periodically, HANA will combine the main and delta stores in a process called a delta merge, to recompress and re-sort the data with new dictionary keys.

          (0) 
  2. Srinivas Kakarla

    John,

    I recently installed SAP ERP 60 EHP7 on SAP HANA DB and observed that all the tables were unloaded before start of client creation. And the tables were re-loaded as the client copy progress.

    1-6-2014 9-42-35 PM.png

    I observed this behavior with the alert notifications received from this instance. Can you confirm this unloading of column store tables.

    -Srinivas K.

    (0) 
  3. Shreepad Patil

    Nice and informative blog John,

    I have one question though, how did you arrive at the total number of partitions 4×16? Is it the ‘# of HANA nodes’ x ‘# of core per node’?

    Thanks and Regards,
    Shreepad Patil

    (0) 
    1. John Appleby Post author

      Well you need a HASH partition for each node, so you evenly distribute data and use all the nodes when you ask a query. I had 4 nodes in this example.

      And then, I chose to partition by year, so that when a year loads and is merged, it isn’t touched again. This reduces the cost of the delta merge process.

      Partitions of 20-50m rows is a good number. Less than that, and the cost of joining the partitions rises. More than 100m and the merge process can get expensive.

      (0) 
  4. Miller Hu

    One question,

      As only one range partition was hit, there should load 4 column-partitions. But in fact 8 columns are loaded, I noticed that HANA also loaded the rest partition, is it been loaded by default?

    (0) 
  5. Christian Harrington

    Hi John and others,

    we have rev 85, and we tested also with latest rev 93 and we got different behaviors than yours:

    1) simple scenario: we partitionned a table by range with 4 partions (one by year)

    2) we unload completely the table

    3) we do some SQL with where clause based only on one year. Then the first time we run that query, HANA create a Plan cache ID in M_SQL_PLAN_CACHE. To do that it reads ALL partitions and loads ALL key fields into memory.

    4) we unload completely the table again

    5) we run the same SQL as in 3). This time the plan cache already exists, so only the requested partiton is loaded into memory, with requested fields as well.

    We had some discussion with SAP and they said it’s a normal HANA behavior. But to me this limits a lot the advantages of partitionning for query pruning and memory management, since key fields can take sometimes a lot of space since it’s high granularity.

    Or maybe we are missing some settings/parameters somewhere?

    Any recents news on this, since ths blog is from 2014?

    Thanks

    Christian

    (0) 
  6. Payal Sachdev

    Great Insight 🙂

    John could you also describe the Reason for the column unloads like MERGE, LOW MEMORY, and give one example like above for different types of Unload reasons.

    (0) 

Leave a Reply