Technical details about data aging Part II
To understand the data aging in detail we will go a little bit deeper in this part of the blog series.
As example we will use table CDPOS. Data aging is in use and aging runs already were executed. As result we get this data distribution.
So if we select the record count via SE16 we will get the following result:
Here we can see the partitioning attributes and the record count of each partition
=> all partitions are loaded partially like mostly in business time
=> we also see the partitiong as configured in the ABAP backend
SQL Test
Now we will check different statements (also with count clause):
1) select * from CDHDR;
2) select * from CDHDR where MANDANT='100';
3) select * from CDHDR with RANGE_RESTRICTION ('CURRENT')
4) select * from CDHDR where MANDANT='100' with RANGE_RESTRICTION ('2016-11-01');
5) select * from CDHDR where MANDANT='100' with RANGE_RESTRICTION ('0001-01-01');
6) select * from CDHDR where MANDANT='100' with RANGE_RESTRICTION ('CURRENT');
7) select * from CDHDR with RANGE_RESTRICTION ('0001-01-01');
Test results
Row Count | |
SQL1 | 88.138 |
SQL2 | 52.226 |
SQL3 | 36.884 |
SQL4 | 51.281 |
SQL5 | 52.226 |
SQL6 | 972 |
SQL7 | 88.138 |
=> interesting is, that you can achive the same results with and without the range partitioning – which should not work if we can believe the SAP notes
DBACockpit => Diagnostics => SQL Editor
=> here you can also save the execution plan and import it into HANA Studio for a detailed analyses
Details – Plan viz
SQL1
=> we see a search without filters on all partitions
=> the exec plan is identically to SQL7 => no dynamic search is used here
=> time spend for all parts are pretty low, because some of them are loaded into memory and the row count is also pretty low
SQL4
=> now we see a dynamic search in cause of using the range restriction
=> but this time not on partition 2 (00010101 – 20160101)
=> closer look into the dynamic search
=> we see the filter on MANDANT on main and delta store of the partition 4
Same query with unloaded partitions besides current (part id 1)
=> in the overview we see that the expensive part is on a new operator ‘Delta Log Replay’
=> the select on the current partition is still fast
=> but on the unloaded partitions there must be executed a delta log replay on first access/load
=> the expensive ones are partition 3 and 4
=> so if you have big partitions which are not accessed frequently you can run into performance issues on first access
=> 3,3ms (loaded partitions) vs. 292,5ms (first access on historical partitions) = factor 97 slower
SQL6
=> on the current partition we see also the dynamic search with the select on both parts (main/delta) – this time with some results on the delta store
Summary
- currently no restriction if you use range_restriction or not (should not be handled different in using a ABAP report) => SAP notes say that if you do not use the Range_restirction clause you only will receive the current/hot partition (Note 2416490: “You can enable data access from historical area by using the classes CL_ABAP_SESSION_TEMPERATURE and CL_ABAP_STACK_TEMPERATURE.”)
- first access on historical partitions can be expensive => the cause is not the disk time but the delta log replay
- data aging is working and can reduce your memory footprint if you consider correct partitioning and define the best threshold for the current partition
Hello Jens,
Very informative & helpfull blog on data aging.Thankyou for sharing this.we are looking to implement the hana data aging in our landscape. Regarding this I have below two queries which i could not fine the answear anywhere. It will be great help if you can help me with the answear to these two queries.
1. As data aging involves automatic table level change like addition of the column “_DATAAGING” to the concern table & other non-manual changes done by the data aging in the database. Is there any transport request generated for this whole data aging change process? If no such change request is generated during the process then will not there be a table structure inconsistency between different systems in the landscape considering if i had done the data aging for that table in one system & in other system i didnot do it(like between DEV & QA)?
2. So when i will schedule the periodic background job for taking care of the future growth of the table, how the future growth will be hanle.
For example i have done a data aging where i have three partition 1. for 2017 data(cold) 2. for 2018 data(cold) & 3rd for 2019 data(hot) & my data restriction says anything older then 1 year should be moved to cold storage in that case wil there be a 4th paritition created automaticly in 2020 & 2019 data will be moved to the 4th partition & then the 4th parititon be moved to cold storage?
Thanks,
Rajdeep