Effects of changing data in HANA's column store ta...

Former Member · ‎01-30-2014

I was curious about the inner workings of HANA. When you change data in column store tables, what is the effect? How much is written to the data files, how much change happens to the backup dumps, how much is written to the log files? With a real SAP system, you can measure the activity of the HANA database, but it is hard to measure the actual amount of changed data in some kind of controlled way. As usual, SAP provides some documentation, but not in the area I am interested in (with the usual disclaimer "as far as I can see"). The real fun with SAP always starts with reverse engineering, so I decided to create my own lab experiment.

My lab consists of:

- HANA 1.0 revision 70

- 7.4 GB or 49.108.572 rows of real-life ASCII data

- resulting in a 960 MB column store table

I wanted to measure in a controlled way the reaction of the database to changes in this table. First I delete some data, then I insert the same amount (but different) data. These two steps should simulate an update. I had a look at the in-memory table size. I created a backup dump and measured how much of the data file and backup dump file have changed. I tried to identify the written log files. This table summarizes my measurements:

Description	Delete #1	Insert #1	Delete #2	Insert #2	Delete #3	Insert #3	Delete #4	Insert #4	Delete #5	Insert #5
MEMORY_SIZE_IN_MAIN	998.832.085	1.049.845.069	1.041.197.293	1.071.021.253	1.075.204.349	1.086.465.005	1.089.642.421	1.117.598.501	1.124.042.589	1.146.048.837
RECORD_COUNT	47.359.893	49.187.726	47.620.728	49.179.120	47.542.719	49.117.838	45.957.846	49.701.911	43.134.825	49.712.183
delta rows	1.748.679	1.827.833	1.566.998	1.558.392	1.636.401	1.575.119	3.159.992	3.744.065	6.567.086	6.577.358
delta size (raw ASCII Bytes)	276.500.919	288.948.984	247.715.063	246.271.066	258.70.592	249.046.220	499.568.790	591.892.341	1.038.293.459	1.039.845.787
delta size (column store Bytes)	9.232.860	51.012.984	8.647.776	29.823.960	4.183.096	11.260.656	3.177.416	27.956.080	6.444.088	22.006.248
changed datafile (Bytes)	135.090.176	1.318.801.408	1.112.358.912	1.369.174.016	1.119.375.360	1.366.949.888	149.934.080	3.299.258.368	91.615.232	3.882.795.008
ratio of changed datafile	0,036	0,351	0,296	0,364	0,298	0,364	0,040	0,819	0,023	0,851
delta backupfile (Bytes)	48.971.776	1.066.991.616	1.060.003.840	1.117.069.312	1.073.582.080	1.098.489.856	62.418.944	1.202.388.992	90.517.504	1.284.259.840
ratio of changed backupfile	0,037	0,780	0,786	0,790	0,793	0,767	0,043	0,835	0,061	0,844
logfiles written (Bytes)	16.314.368	239.538.176	14.798.848	204.611.584	14.266.368	206.983.168	25.640.960	486.739.968	55.025.664	856.743.936
Annotations		3,2% changes	delta merge	3,2% changes	delta merge	3,2% changes		6,5% changes		13% changes

The table contains a lot of information, let me summarize my most interesting findings:

After the delete #2 and delete #3 I manually performed a delta merge on the table, which is of course not necessary or outright useless. Surprisingly, this delta merge has got a huge effect on the data file and the backup dump file. Delete #1 was comparable in size, but without a manual delta merge and shows only very small changes to the data file and the backup dump file.
Delta merges after inserts have always significant effects on the data file and backup dump file. The changes to the data file are typically several times higher than the actual amount of changed data. This is no issue for HANA since these writes happen asynchronously, but it is important to know that the amount of change to the datafiles is not representative for how much change actually happened to the data.
Even though the delta merge causes significant changes to the table representation both in the data file and in the backup dump file, it is still possible for most real-life SAP systems to use backup deduplication because only a tiny percentage of the overall data is changed per day. (I verified that on an actual BW on HANA system.) However, I predict that there is some critical threshold, if too many table (partitions) are changed and therefore reorganized via a delta merge then it won't be possible to benefit from a backup deduplication any more.
Only delete #1 and Delete #2 showed a reduction in the memory consumption of the table. Delete #3, Delete #4 and Delete #5 increased the memory consumption of my sample table! Also the data file and the backup dump file size increased over the time by ~20%. Contrary to that, the amount of data was the same at the beginning and at the end of my experiment.
The amount of log files written by HANA correlates very well with the amount of (raw ASCII) data inserted into the database. So as a rule of thumb, if you want to know how much data has been added to or updated in your HANA instance, have a look at the log files.
Deletes are very efficient in terms of log files, only few data needs to be written to the logs. Don't expect any significant reduction of the in-memory space requirements, however.

In case of questions or if something was unclear, please ask.

Effects of changing data in HANA's column store tables

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z