HANA Housekeeping using HANACleaner
ANY USAGE OF HANACLEANER ASSUMES THAT YOU HAVE UNDERSTOOD AND AGREED THAT: 1. HANACleaner is NOT SAP official software, so normal SAP support of HANACleaner cannot be assumed 2. HANACleaner is open source 3. HANACleaner is provided "as is" 4. HANACleaner is to be used on "your own risk" 5. HANACleaner is a one-man's hobby; developed, maintained and supported only during non-working hours 6. All HANACleaner documentations must be read and understood before any usage: - SAP Note 2399996 - The .pdf file hanacleaner_intro.pdf - All output from executing python hanacleaner.py --help 7. HANACleaner can help you execute certain SAP HANA tasks automatically but is NOT an attempt to teach you SAP HANA Therefore it is assumed that you understand all SQL statements that HANACleaner does to make changes in your system To find out what crucial SQL statements HANACleaner will do without executing them, run with the additional flags -es false -os true To then learn what those statements do before you executing HANACleaner without "-es false", see SAP HANA Admin Guide or SAP HANA System Administration Workshops 8. HANACleaner is not providing any recommendations, all flags shown in the documentation (see point 6.) are only examples For recommendations see SAP HANA Administration Workshops or other documentation, like e.g. SAP Note 2400024
The blog provide guidelines on the necessary steps that needs to be followed to automate housekeeping task using SAP HANACleaner script.
It’s really a good practice to set up housekeeping task from the early stage of your project as there are some task which you know in advance whereas some housekeeping task comes after your production system go-live like Table Growth.
This blog demonstrate the steps involved to perform housekeeping for cleanup of backup catalog entries. The idea behind performing this activity is that, with time entries in backup catalog table grows which result in higher data footprint at persistent level results in more memory utilization. So, this is one of many tables on which we can perform housekeeping task and keep it under threshold.
This section will provide an overview on what is HANA Catalog and where it used and how to check its size etc. and all the relevant information with respect to backup.
- Backup Files – Delete according to backup retention times
- Backup Catalog – Delete old entries according to backup retention times
- Trace files backup.log and backint.log – Delete old entries according to backup retention times
For database operation, backup catalog is loaded into main memory so the size of backup catalog matters.
In persistent storage, your actual data/log, parameter and trace files are stored, and the backup of data and log volume are placed in data and log backup. The backup catalog for file-based backups is written as a separate backup to the location where the log backups are stored.
NOTE: Cleanup of backup catalog entries is also necessary if you have third party backup solution.
Here /backup is backup storage where all data and log backup files are places. The backup of “backup catalog” (log_backup_0_0_0_0.1539032862322) is placed in log backup directory along with log backup.
Backup files and catalog can manually be deleted using HANA Studio. The method to do so is as below –
In HANA Studio. Login to your SYSTEMDB and open “Backup”
Select the database for which you want to delete backup files and catalog. This option is only visible if you are in SYSTEMDB database. If you have logged in tenant database and navigated to backup folder, the option for database selection will not be available.
Right click on the data backup from which you want to delete older backups.
Here you can just delete the catalog or you can delete the catalog and backup files from the file system location.
It will provide you the list of backup files that will be deleted on choosing “Finish”. It will delete all complete, delta and log backups older than the backup you have selected.
In HANA, it is not possible to delete just the log or incremental backup from backup catalog. The reason is obvious as deletion of any random log and incremental backup will make data backup useless for point in time recovery.
Even if you try to delete an entry of log backup from backup catalog, it will result you with error
BACKUP CATALOG DELETE ALL BEFORE BACKUP_ID <backup_id>
SAP DBTech JDBC: : invalid backup id: Invalid backup id '<backup_id>'
Automatic Housekeeping – Using HANACleaner Script
Now to get rid of this manual task of deleting catalog, you can schedule a job to delete this catalog on regular basis. You have to perform one-time activity of scheduling a job for the same. The process of doing that is explained below.
Certain SAP HANA cleanup tasks like purging the backup catalog or deleting old trace files (SAP Note 2119087) need to be implemented individually. SAP HANACleaner is now available to perform this task automatically.
The SAP HANA cleaner is a house keeping script for SAP HANA. It can be used to clean the backup catalog, diagnostic files, alerts, to compress the backup logs, and much more. It should be executed by adm or, in case you use a CRON job, with the same environment as the adm. This script is expert tool designed by SAP support. You can use it, but SAP doesn’t take over any responsibility for problems originating from the use of this tool.
SAP Note 2399996 – How-To: Configuring automatic SAP HANA Cleanup with SAP HANACleaner
Always refer the latest version of SAP Note to get most updated HANACleaner script.
SAP HANACleaner can be used for the following cleanup tasks:
- Cleanup of backup catalog entries
- Cleanup of backups
- Cleanup of trace files
- Cleanup of backup.log and backint.log
- Cleanup of audit logs
- Cleanup of SAP HANA alerts
- Cleanup of free log segments
- Cleanup of internal events
- Cleanup of multiple row store containers
- Cleanup of data file fragmentation
- Cleanup of SAP HANACleaner logs
- Cleanup of arbitrary directories (e.g. SAP HANASitter logs)
- Creation of optimizer statistics for SDA tables
- Optimize compression of tables not compressed
- Optimize compression of tables with columns not compressed
- Optimize compression of tables with large UDIV overhead
- Cleanup of object lock history for ‘(unknown)’ objects
This document talks about cleanup of backup catalog entries. Similarly, cleanup for rest of the task can be schedule as per your business need.
Install Phython Script
You can install SAP HANACleaner in the following way:
Download the script mentioned in SAP Note 2399996 or you can use direct link -> https://github.com/chriselswede/hanacleaner.git
Copy it to a directory on your SAP HANA database server
Once it is copied, you can explore various options available to execute script. The following command provides you with an overview of the way how SAP HANACleaner works and the available configuration options:
sidadm> python hanacleaner.py --help
- It is intended to be executed as <sid>adm on your HANA server (as environment variable for your python script is already set with <sid>adm on your HANA server).
- It connects via host, port and DB user provided in hdbuserstore.
- The DB user needs proper privileges.
Create user in all HANA databases (System DB + Tenant DBs)
NOTE: Depending on what housekeeping tasks, the specific hanacleaner user will need specific sets of privileges. As I just want to delete backup catalog entries, I have used below set of roles.
Disable the expiry of user by executing SQL command –
ALTER USER HKPSCH DISABLE PASSWORD LIFETIME;
Create similar users in all Tenant Databases (I have 3 tenants, so have created users in all 3 tenants) and disable the expiry of users
As mentioned earlier, python script connects via host, port and DB user provided in hdbuserstore
Configure “hdbuserstore” for all Databases
hdbuserstore set <KEY> <hostname>:<sql port> <username> <password>
---SYSTEMDB hdbuserstore set CLNRSCH <hostname>:30013 HKPSCH *********** ---Tenant 1 hdbuserstore set CLNRXXX <hostname>:30041 HKPXXX *********** ---Tenant 2 hdbuserstore set CLNRYYY <hostname>:30044 HKPYYY *********** ---Tenant 3 hdbuserstore set CLNRZZZ <hostname>:30047 HKPZZZ ***********
Here XXX, YYY, ZZZ are SID of tenant databases.
Once hdbuserstore is configured, you can cross check with below command
# hdbuserstore list
Before executing script, kindly perform checks to validate your backup catalog. Below example, I’m showing you everything on SystemDB but same needs to be followed for all tenant databases as well.
To check oldest backup entries in catalog and the size of catalog, execute below query in System DB and all tenant Database.
---Oldest Backup and Size of Backup Catalog SELECT TO_CHAR(CURRENT_TIMESTAMP, 'YYYY/MM/DD HH24:MI:SS') EVALUATION_TIME, LPAD(TO_DECIMAL(C.CATALOG_SIZE_MB, 10, 2), 15) CATALOG_SIZE_MB, TO_CHAR(B.OLDEST_BACKUP_DATE, 'YYYY/MM/DD HH24:MI:SS') OLDEST_BACKUP_DATE, LPAD(B.OLDEST_BACKUP_DAYS, 18) OLDEST_BACKUP_DAYS FROM ( SELECT TOP 1 BF.BACKUP_SIZE / 1024 / 1024 CATALOG_SIZE_MB FROM M_BACKUP_CATALOG B, M_BACKUP_CATALOG_FILES BF WHERE B.BACKUP_ID = BF.BACKUP_ID AND BF.SOURCE_TYPE_NAME = 'catalog' AND B.STATE_NAME = 'successful' ORDER BY B.SYS_START_TIME DESC ) C, ( SELECT MIN(SYS_START_TIME) OLDEST_BACKUP_DATE, DAYS_BETWEEN(MIN(SYS_START_TIME), CURRENT_TIMESTAMP) OLDEST_BACKUP_DAYS FROM M_BACKUP_CATALOG ) B
SELECT COUNT(*) FROM M_BACKUP_CATALOG;
Run HANA Mini Check from SAP Note 1969700.
Based on the age of HANA database, the size of backup catalog will increase. Above mini check shows you the size of backup catalog of SystemDB on which I have ran mini check. I have 3 tenant and the value of backup catalog on each comes are to be similar i.e 20 MB. So 80 MB has been utilized in total at persistent level. So housekeeping this table will help us to keep data footprint low which eventually utilize less memory.
Execute below command with <sid>adm
# python hanacleaner.py -bd 90 -br true -k CLNRSCH
Here it will delete entries in backup catalog older than 90 days. If you set -bb true, then along with the backup catalog entries corresponding backup files also gets deleted. But it is of no use, if you are using third party backup tool.
But if you use both -be and -bd, the most conservative, i.e. the flag that removes the least number entries, decides
For cleaning up the backup catalog (and possibly also backups) hanacleaner has the following input flags –
Now check the oldest backup days and size of catalog and you will see that entries are been deleted and catalog size has been reduced.
So for all tenant databases same python script needs to be executed along with its user key CLNR<SID>
# python hanacleaner.py -bd 90 -br true -k CLNRXXX # python hanacleaner.py -bd 90 -br true -k CLNRYYY # python hanacleaner.py -bd 90 -br true -k CLNRZZZ
Configure Cron Job
We will be using cron job to run this script weekly. Some customer don’t allow to use cron job, so schedule this script using third party tool.
Note: hanacleaner expects the environment of <sid>adm –> if we use CRON the same environment as <sid>adm has to be provided.
Check with shell <sid>adm uses
This shell script, hanacleaner.sh, provides the <sid>adm environment, with source $HOME/.bashrc and then executes the hanacleaner command:
Then a new crontab can be created, calling this shell script, e.g. every Saturday at 6 o’clock in morning. You can use https://crontab.guru to get time for cron job
If you want to log the output to std_out set up the crontab like this:
NOTE: You can also insert email syntax in cron job, so on the completion the task it triggers an email along with output.
Very nice information and well explained. Thank you Dennis for sharing.
Hi Dennis Padia,
Nice blog, can you please confirm on "clear data file fragmentations" point.
How to do that.
Kindly refer to the section "HANACleaner - Disk Fragmentation" from below pdf. This pdf explain all the options supported by this python script.
Main SAP Note: 2399996 - How-To: Configuring automatic SAP HANA Cleanup with SAP HANACleaner
Great supportive information.
Thanks for this nice blog. I have one query regarding hanacleaner script in HA env. We have 2 nodes (primary and secondary). We have schedule hanacleaner script in node 1 i.e primary. If suppose failover happens and DB is running on node 2 (former secondary, currently primary) & hanacleaner script is still running on node 1 (former primary, currently secondary) ,older backups are getting deleted from backup catalog as mentioned in the script but the physical backup files which resides on node 1 are not getting deleted. Even we fail back to node 1, script is not able to delete the physical files of data backups as the same have been already deleted from backup catalog.
Could you please let me know, how we can configure hanacleaner script for HA environment.
note : The backup mountpoint is not shared between the nodes and hancleaner script is only running in node 1. Even the DB running on node 2, we are still able to connect to DBs via hdbsql -U <keyname> command from node 1. We are using following command for hanacleaner:
python /usr/sap/<SID>/HDBnn/exe/python_support/hanacleaner.py -bd 25 -bb true -br true -k <key name> -vlh <DB virtual hostname> -os true -op <location of output files>