How to reduce your HANA database size by 30%
I didn’t write enough blogs last year, and felt like I abandoned SCN a bit. Lately a few people have kindly commented that they enjoyed reading my content, which is being nicer to me than I deserve. So here’s a little gift to the year off.
This script is only useful if you have a HANA system which was installed with an older revision (SPS01-07) and you have upgraded it a bunch of times and it’s now on a newer release (SPS08-10).
In that scenario, it’s possibly the most useful thing you will see all year for a HANA devop. In a productive HANA system we saw disk footprint reduction from 2.9TB to 1.89TB and in-memory footprint reduction by over 100GB. It will also substantially decrease startup time, decrease backup time, and increase performance.
What happens is that HANA chooses the compression type of a column store object when it creates it, and only occasionally re-evaluates the compression type. In older databases that have a lot of data loaded since the initial installation, it can mean that the compression is suboptimal. In addition, objects can be fragmented and use more disk space than is really required.
This script takes care of all that and cleans up the system. It takes some time to run (18h in our case).
A few caveats (these are general best practices, but I have to point them out)!
- Run this script in a QA system before production, for test purposes and so you know how long it will take
- Run it at a quiet time when data loads are not running
- Ensure you have a full backup
- Use this script at your own risk, like any DDL statement it could cause issues
- Do not restart HANA during this operation
- Complete a full backup after the script, and restart HANA to reclaim memory
Thanks to Lloyd Palfrey, who actually wrote it.
— HANA Reorg Script – 2015
— HOW TO RUN:
— After creating the procedure you can run it with this statement:
— call “_SYS_BIC”.”dba_reorg”(‘INSERT_SCHEMA_NAME_HERE’);
CREATE PROCEDURE “_SYS_BIC”.”dba_reorg”(IN pi_schema VARCHAR(60)) LANGUAGE SQLSCRIPT AS
CURSOR c_cursor1 FOR
WHERE SCHEMA_NAME = :pi_schema;
— Recompress Tables
FOR cur_tablename AS c_cursor1() DO
EXEC ‘UPDATE “‘ || :pi_schema || ‘”.”‘ || cur_tablename.TABLE_NAME || ‘” WITH PARAMETERS (‘OPTIMIZE_COMPRESSION’=’FORCE’)’;
— End of stored procedure
— Reorg Rowstore
EXEC ‘ALTER SYSTEM RECLAIM DATA SPACE’;
— Trigger Rowstore GC
EXEC ‘ALTER SYSTEM RECLAIM VERSION SPACE’;
— Create Savepoint
EXEC ‘ALTER SYSTEM SAVEPOINT’;
— Reclaim LOG space
EXEC ‘ALTER SYSTEM RECLAIM LOG SPACE’;
— Reclaim DATA space
EXEC ‘ALTER SYSTEM RECLAIM DATAVOLUME 110 DEFRAGMENT’;
Thanks for sharing John!
If I have the possibility to test it I will provide the compression factor as further input.
i Will test this 🙂
Thanks for sharing John.
Thanks John for sharing. very good document.
Alright, this time around I bite.
First of all: thanks for sharing your experiences with SAP HANA. They definitively add to this community, so I am happy to have you blogging here again.
Now to the blog itself. I consider this type of blog "selling snake-oil" or the "magic bullet" (or whatever you want to call the miraculous thing to improve your situation).
While you put the caveats in red letters to provide warning, I'd say that's not really providing enough information on the operations executed by the script.
Running it on a test/QA system will only allow a time estimation if the system nearly has the same data and similar load during the run.
Since some of the activities will impose locks on various levels (table locks and system wide savepoint locks) running it during production hours, might end up in a halted system.
Technically you could actually restart SAP HANA during any of the operations, but that will of course require extended startup times due to the required recovery.
For the script itself, in order to run, the user running it must have the UPDATE privilege on the tables of the underlying schema. This should actually not be the case for normal administrators.
Of course the script will lead to a reduction of space usage. If it actually reaches 30% on a regular basis is questionable though.
And since the script just applies a set of operations without actually analysing the current state of the system it really is a sledgehammer approach and reminds me a lot of the recommendations for index-rebuilds, defrags and CBO stats-recollections that where/are so common for e.g. Oracle DBs.
Knowing the effects of such recommendations, I can see another wave of support tickets running at the colleagues in SAP HANA support...
Anyway - now that this little loop script has been published users will use it forever and likely not heed any further warnings (like this current one).
So, just in case anyone uses the script and gets in trouble: I told you so...
luckily there are some syntax errors in the script, so in this state it won't work. I'm scared that no one tested it and noticed this. So currently no wave of support tickets will be knock on the door 😉 May be it worked in older releases but in 90+ no of my system accepted this syntax.
A simple example:
There ist no
"ALTER SYSTEM RECLAIM LOG SPACE"
=> just a
"ALTER SYSTEM RECLAIM LOG"
Another one is inside the stored procedure, but I won't correct it, because I have the same opinion like Martin and Lars. You should check this individually with the already mentioned script collection and do this selectively.
May be it makes sense to run it after the initial migration to HANA to aim the best possible compression from beginning. I have corrected the syntax and tested it on my testsystem and it takes a long time and you never know when it will finish. Bad in a planned maintenance with a time table.
I also tested this and yes the syntax was incorrect but choose not to comment as to leave this thread buried but now it has resurfaced
Hi to all,
what about the 1813245 - SAP HANA DB: Row store reorganization ?
For Revisions later than or equal to Rev.81
Set up configuration parameter in indexserver.ini via HANA Studio.
page_compaction_enable = true
page_compaction_max_pages = 1048576
Are you forcing the REORG even if the result is "FALSE"? in this note, the info is that, only start a REORG if:
Row store reorganization is recommended, when allocated row store size is over 10GB and free page ratio is over 30%.
If the result of "Reorganization Recommended" is "TRUE", then row store memory can be reclaimed after row store reorganization.
Thx in advance
the above should not be news to anyone who is familiar with SAP FAQ notes on compression and garbage collection (these two notes should be required reading).
It is a well-known fact that compression evaluation and therefore optimization fall behind. It'd be great if someone from SAP can weigh in and explain why.
It is important to note that the gains from forcing compression optimization are short-lived in the case of frequently changing tables.
In one of my tests, a frequently changing table of 210 GB table shrunk in size to 160 GB after forcing compression optimization. However it bounced back to 200+ GB within 5 days.
The obvious reason for not permanently trying to use the best possible compression is of course: that's computationally expensive.
It takes time to do that and uses resources that could be used for business transaction instead.
Also, it's not possible upfront to determine if and how large a benefit would actually turn out to be.
Besides, it's not so much the pure change of data volume that leads to compression methods not being optimal anymore. It's the change of data distribution in each column of a table and relative to each other that makes the difference here.
The automatic compression optimisation therefore is calibrated to be rather not too proactive, in order to not disturb system operations all too much. But it will be done if the table contents massively fluctuates. Why your example table changed all that much would need to be looked at in detail. My gut feeling however says, that it's not the compression optimisation. If it were, have you checked how the selected compression methods change over time for each column?
SAP in general tries to eliminate weaknesses in the current algorithms to make sure that memory is used as efficient as possible. For example, the decision formula for compression is improved with current Revisions so that tables with a bad compression or with a lot of MVCC garbage are compressed earlier.
If you want to use a targeted approach to reduce your database footprint (in terms of memory and disk) rather than reorganizing everything in an unconditioned manner, you can use mini checks like the following as a starting point (SAP Note 1999993):
- Check ID 370: Unused space in data files (%)
- Check ID 420: Heap areas currently larger than 50 GB
- Check ID 453: Size of non-unique concat attributes (GB)
- Check ID 535: Row store (> 10 GB) fragmentation (%)
- Check ID 560: Tables > 10 Mio. rows not compressed
- Check ID 561: Columns > 10 Mio. rows not compressed
- Check ID 565: Tables > 10 Mio. rows and > 200 % UDIV rows
- Check ID 640: Number of trace files (total)
- Check ID 642: Size of trace files (GB, total)
- Check ID 644: Size of largest trace file
- Check ID 745: Total size of statistics server tables (GB)
- Check ID 854: Undo and cleanup file size (GB)
- Check ID 950: Log segments not free for reuse
- Check ID 952: Log segments free for reuse
SAP Note 1999997 provides more advice about reducing the SAP HANA memory consumption.
Hi to all,
I agree with Martin. The mini-check and the note 1813245 are a good way to decrease and maintain the DB. The last time that I used the memory Reorg, I was able to decrease more than 50%. I will share, next week, my latest tests.
we recently upgraded our ECC / SRM and BI to SP9. though we got very good compression in ECC (3.9 TB to 2.6TB). For SRM and BI the compression didnt happen at all. We have not run any script to achieve this in ECC.
BI and SRM have the DB size of 650GB and 500GB respectively where we didnt see any compression. Any reasons to it.
The mini-check can help to understand where the problem is or you can run the HANA_Tables_ColumnStore_TablesWithoutCompressionOptimization statment (Note 1969700 - SQL statement collection for SAP HANA).
ℹ In a typical replication environment, we need to adjust the replication parameters to make this script work.
but this is not related to the script, but to the fact that every time you execute a "ALTER SYSTEM RECLAIM DATAVOLUME DEFRAGMENT" you have to take care of your secondary side.
Everyone who is interested in details: 1999880 - FAQ: SAP HANA System Replication
=> Q:19. How can RECLAIM DATAVOLUME be executed when system replication is active?
Thanks for sharing this blog.
We do run this SQL command in our SAP HANA Database as required.