Quite often, I have discussions with people on how HANA handles memory management. Robert Klopp and I got into this in some detail, so I thought I’d write a document giving the facts.
It is often thought that HANA is an in-memory database, only. However this is not accurate: SAP HANA persists all data to disk. Once it is persisted, upon startup, HANA (by default, this is configurable) loads data on request. It loads it based on the partition-column that is required. Let’s take an example to understand this, because this is subtle.
Take the following table TRANSACTION. It has 7 columns including AMOUNT, TIMESTAMP and CUSTOMER_ID. It is partitioned by HASH(CUSTOMER_ID) and then RANGE(TIMESTAMP), by year. This is a total of 4×16=64 partitions, or 448 column-partitions.
At startup, HANA won’t load table TRANSACTION and you can see the memory consumption is zero (well, 114kb). Just to be sure, I issue:
Now, let’s go ahead and check that in table SYS.M_CS_COLUMNS, which holds all the information about columns in-memory and out:
That’s what we expect. So I’m going to go ahead and run a query which will definitely only hit a single partition-column. This should only load 2012 data for column TXAMOUNT.
Lets see what columns have now been loaded:
As expected, there are 8 columns – partitions 13 and 16, which are for the year 2012 and “Other”, respectively. Now let’s check how much data is used:
Let’s run LOAD TRANSACTION ALL – to load all the table, to see how big it is:
Yes – asking this question only required 5% of the data to be loaded into memory. Note the column TXTIMESTAMP is a very big column – in many cases it would need much less than 5%.
What happens when HANA runs out of memory?
When HANA hits 95% memory, it starts to unload partition-columns that were least recently used. You can find them in the unloads trace file.
You can see that it unloads specific partition-columns (60,61 are the partition numbers, the column is at the end) based on those that were last used.
I hope this helps clarify exactly how HANA behaves around loading data into memory. In many cases, it is possible to have more data than you have RAM in your HANA environment.
But, remember, the working set of partition-columns must take no more than around 50% of your overall RAM because HANA needs memory in which to do calculations. HANA is not designed to load and unload data continually and it is relatively inefficient at this compared to a row-based RDBMS – instead it is optimized for very fast performance in-memory. If you have a situation where your working set – the data on which you want to operate – is larger than your available memory, then performance will tank.
But, if you load data, schemas, tables, partitions and columns which you don’t use into HANA, they don’t need to be loaded into memory.