This blog is under construction
Excellent Article !!! Thanks for writing this up.
This is indeed a well-written article, with a consistent idea and working code that implements this idea for partition handling.
There are a few things, however, that should be pointed out.
The approach to handling partitions described focusses on keeping the number of partitions the same by using a “sliding window”. All “old” data is lumped together into a single ever-growing “old data” partition, that gets unloaded once the re-partitioning is done.
This also means that this “old data” partition needs to be loaded back into memory, the next time the re-partitioning script. Since this partition only ever grows, the memory requirement for it will also grow over time and eventually will become a problem (other objects in memory will be unloaded to allow for the big partition to be loaded, delta merge for this partition will take longer, other objects need to be reloaded…).
As the article specifically mentioned HANA 2.0 as the target release, a better option is to consider dynamic partitioning, multistore tables, and maybe heterogenous partitioning.
Since the purpose of this exercise is to only keep a certain amount of data in memory, simply adding new partitions as new data is added will automatically leave older data, that is not accessed anymore, in partitions that will be unloaded eventually.
If there is an urgency to make this automatic unloading more likely, then the UNLOAD PRIORITY can be set accordingly. One may also just unload the respective old partitions manually (as shown in the script), but this presumes knowing for sure that the data in those partitions does not get accessed anymore.
While it is very intuitive to think about partitioning in terms of “months of data” and map this notion directly to the partitioning scheme, it is a limited understanding of partitioning. For the most part, table partitioning is there to keep the management of data storage structures small enough to be easy to manage; improved query performance is an effect of this.
If the idea of “data aging” should be implemented stronger than the informal partitioning scheme, HANA 2.0 also supports “Time Selection Partitioning (Aging)” which is what the S/4 HANA applications leverage. This approach is more extreme in terms of memory optimization but requires more conditions to hold true from the data usage patterns to be effective.
Considering all this, I can’t help to notice that the technique discussed in the article seems to rather fit a HANA 1 system where the mentioned options have not been available.
Unrelated to this, there is also the part where it’s suggested to run the re-partitioning procedure automatically via calendar-based scheduling. This is a rather risky approach re-partitioning locks the table and does not have a guaranteed maximum duration and can potentially block all sessions that need to access the table.
While I believe the points I mentioned are important, I want to circle back to my opening sentence and point out again, that this article is really well produced and an enjoyable read.
This is a rare thing nowadays on SCN and it’s important to acknowledge and appreciate this.
Thanks for sharing this!
Thank you for very encouraging and valuable advice. This solution is the first step of our initiative moving toward data life cycle management. We are having SAP HANA 2.0 SPS03 and later this year, We will migrate to SPS 05 as soon as it is available. I will revisit the solution and add warm store like dynamic tiering or NSE, also spark or Hadoop for cold store. So the blog for next steps will be provided in future.