HANA column store backgrounds (INSERT ONLY)
After finishing this , we can continue our series.
We’ll start with some background information of the column store that will help us to better understand the guidelines
introduced by Eric Westenberger.
It is often mentioned that SAP HANA uses an „insert only“ approach. What does this mean for you as ABAP
developer? Does this mean that UPDATE or DELETE statements cannot or should not be used anymore?
In this post we will take a look under the hood and explain the consequences.
Before we start looking into the technical details, let us give the short answer to the question above:
No, as an ABAP developer you still should use INSERT, UPDATE and DELETE as intended in the application.
The HANA insert-only approach works behind the scenes with immediate benefits. However, understanding the
way how HANA works can be exploited in new designs in particular when it comes to new applications dealing
with large data sets.
Now, let’s pull up the sleeves and dig into the details of the column store within SAP HANA:
HANA is insert only. What does this mean to you as an ABAP developer?
In the column store HANA will only insert data. That is whenever you update or delete data HANA will
just insert. For performance reasons, the column store is devided in a main and a delta part. New data is going
to the delta and will be merged frequently to the main. Changing statements work as follows:
INSERT:
An INSERT statement will just insert a new record in the delta.The merge process will move the record from
delta to main.
DELETE:
A DELETE statement will select the record and mark it as invalid by setting a flag (for main or delta). The merge
process will delete the record from memory once there is no open transaction active for it anymore.
UPDATE:
An UPDATE statement will insert a new version of the record. The merge process will move the latest version from
delta to main. Old versions will be deleted once there is no open transaction active for them anymore. The update
therefore consists internally of an insert + delete.
The following picture illustrates this principle schematically. Please note that this picture does not reflect the real
storage in column store. You can see a deleted record in main (id 7) and 2 old versions (marked in grey) of
records (id 8 and 14) where the name has been updated, which lead to an insert of a new version of the records.
The attached program demonstrates insert only by inserting, updating, deleteing data and reporting record
counts. As you can see while the logical record count does not increase when you do an update the
raw_record_count does. This is insert only in action.
What does this mean for you, the ABAP developer? Should you change the way you work with
INSERT, UPDATE and DELETE?
As we said in the beginning: No, not at all. Let HANA do the insert-only and use INSERT, UPDATE and
DELETE just the way you are used to.
And what are the benefits of the insert only approach?
Well, since the main store is compressed (which is good for reading performance, detailes in the next blog)
changes would be expensive. Just imagine you want to make a change in a compressed text file. You need
to uncompress, make the change and compress again. For this reason data is only appended to the
uncompressed delta store where change operations are much cheaper. The merge moves the data from
delta to main and thus compresses only from time to time and not with every change. Setting the deletion
flag is based on rowids and works for compressed and uncompressed data.
In the next blog we will dive a little bit deeper in the column store and discuss the main and the delta in more
detail.
Hi Hermann,
You have mentioned, in DELETE operation, that " A DELETE statement will select the record and mark it as invalid by setting a flag (for main or delta). The merge
process will delete the record from memory once there is no open transaction active for it anymore.". But,what i have understood the INSERT ONLY concept is that, it does not delete the history of the data i.e. insert-only enables storing the complete history of value changes and the latest value for a certain attribute. (hence making time travel possible! ). So, can you please clear my doubt, is the value from the main column store delete or not? if not, then which memory where you referring to in the blog?
Thanks,
Jeet.
...so the INSERT ONLY in SAP parlance means different way how the db will handle the update. That might be smart & efficient from the in-memory (&compressed) db solution point of view, but that is not how the rest of the word is using the term INSERT ONLY; this is just confusing & annoying.
Thank you for the clarification. This is just another example of how HANA is a disruptive technology without being disruptive 🙂 I for one, think it is really cool. I've been writing code since 1978 and it's not often that something truly new comes along. It's usually just a renaming of old techniques requiring developers to learn new syntax.