XS application for table distribution in scale out HANA system
Author: Zhewei Wang, Svetlin Zaynelov
HANA demands optimal distribution of data across all the HANA blades for best performance. A proper table distribution helps for more optimal load balancing and better parallelization. In this blog we will cover only the table distribution part. Table partitioning optimization will be described in another SCN article.
Here are several basic rules for general DB table distribution which this app follows:
- Large tables should not be replicated.
- Small tables can exist on different nodes to leverage more optimal joins and prevent network transfer between the DB nodes.
- Tables are distributed as evenly as possible.
To help in more convenient and fast data access HANA offers HANA views. We would like to see all the tables required for one view on one node in order to follow the above rules.
I designed an application which helps to automatically check the current table distribution and provide a report for more optimal distribution.
I hope this application will help to save time designing optimal table distribution in scale out environment and having this to improve the HANA views requests performance. The app offers a user-friendly easy to use interface. The application can be further improved involving suggestions for partitioning but this is not part of the current version.
Below is example use case to demonstrate how my application works:
These are the sample tables I have.
These are the sample views I have.
1. Type in the path of target top level views for which you want to optimize the table distribution and click the ‘Get Data’ button. The tables’ metadata is now loaded and are ready to run the algorithm for the optimization.
2. Click the ‘Current’ button. Based on the table metadata, the grid below shows the current table distribution. The first column contains the schema name. The second column contains the table name. Other columns imply different hosts. In the chart, “-” means the table is not on the correspond host and “x” mean the table is on the host.
3. Click the ‘Table Category’ button. The tables are categorized based on their sizes. The first column contains schema names. The second column contains table names. In the table, “x” indicates which group the table belongs to. We have 3 default groups of tables – small (<10 million records), medium (>10M 700M <) and large (> 700M). The values can be changed and the table will be adjusted after the button is clicked again. In this case, small size tables are regarded as “small tables” and medium and large size tables are regarded as “large tables”.
4. Click the ‘Suggestion’ button. A better distribution is displayed in the below grid. This grid is similar to the “current” chart. The rules for table distribution are listed as follow:
- Large tables should not be replicated,
- Small tables can exist on different nodes,
- All the tables in a view are on one node,
- Tables are distributed as evenly as possible,
- The total memory size of tables cannot exceed the max memory size of a host.
The following SQL statement can be used to move the tables among hosts or to replicate tables.
Alter table name Move To
ALTER TABLE MY_TABLE5 ADD REPLICA AT LOCATION ‘<first_slave_node>’;
And the following link could be used as a good resource.
This application is free of charge you can use it on your own risk. This is not standard SAP product. The application uses read-only access to HANA config tables and does not do any update operations.
The source code is in Github.
Import all files under any folder and open the following link in the browser.
<path to your folder>/index.html