It took me nearly two years to write the second part of this partition series. The first one for BW partitioning can be found here.
The main question of this blog: How to determine the best partitioning combination?
In an ERP or S/4 HANA system it is possible that a table is approaching the 2 billion mark. This is a HANA design limit for a column store table (per partition). In this case you have to partition the table. But different than on a BW system there are no standard rules for a partitioning. This means you have to choose a partitioning rule by your own. But how to determine the best partitioning combination?
- Find tables with high amount of records
- Term clarification
- HASH Partitioning
- Range Partitioning
- Multi-Level Partitioning
- Designing Partitions
- Example BSEG
- Example BSIS
- Example for a Z-Table <tbd>
- Parameter for parallel splitting
At first check if partitioning is needed at all which means check the tables with the most row entries. There is a limit per table/partition of 2 billion rows by the design of HANA.
If your alerting is working correct you will receive an alert (ID 27). It looks like this:
“Record count of column-store table partitions ( ID 27 )
Determines the number of records in the partitions of column-store tables. A table partition cannot contain more than 2,147,483,648 (2 billion) rows.”
- Interval 1 hour
- High: 1,900,000,000
- Medium: 1,800,000,000
- Low: 1,500,000,000
The limitations are described in SAP note 2154870.
2. Find tables with high amount of records
Check all tables by high amount of records. Therefore, you can use the SQL from 1969700 (HANA_Tables_TopGrowingTables_Records_History).
Rule of thumb for the initial partitioning:
- 100 mio entries per partition
- 500 mio entries per partition
- if you choose too many partitions you can achieve a bad performance, because one thread per partition have to be triggered (e.g. you have 20 threads as statement concurrency limit and 40 partition have to be scanned which results in waiting for resources)
- if you choose too less partitions, it can be that you have to repartition pretty timely which means another maintenance window / downtime
- recommendation: HASH partitioning on a selective column, typically part of primary key
In our example we determined BSEG and BSIS as tables with high amount of rows. For timely planning of maintenance you should act when the table exceeds 1,5 billion entries.
For our scenario BSEG has 4,5 billion and BSIS has 4,1 billion rows. So, they are already partitioned. But is this partitioning over the time since the initial creation still optimal?
3. Term clarification
To determine the optimal partitioning combination, we have to clarify some terms:
- HASH Partitioning
- Range Partitioning
- Multi-level partitioning
In the context of databases, cardinality refers to the uniqueness of data values contained in a column. High cardinality means that the column contains a large percentage of totally unique values. Low cardinality means that the column contains a lot of “repeats” in its data range. In a customer table, a low cardinality column would be the “Gender” column. This column will likely only have “M” and “F” as the range of values to choose from, and all the thousands or millions of records in the table can only pick one of these two values for this column. Ok, to be accurate nowadays we have to add a third value for diverse “D”.
|Note: Partitioning works better when the cardinality of the partitioning field is not too high|
The selectivity basically is a measure of how much variety there is in the values of a given table column in relation to the total number of rows in a given table. The cardinality is just part of the formula that is used to calculate the selectivity.
|Selectivity = cardinality/(number of rows) * 100|
Hash partitioning is used to distribute rows to partitions equally for load balancing. The number of the assigned partition is computed by applying a hash function to the value of a specified column. Hash partitioning does not require an in-depth knowledge of the actual content of the table.
Source: openHPI: In-Memory_Data_Management_2017
Range partitioning creates dedicated partitions for certain values or value ranges in a table. For example, a range partitioning scheme can be chosen to create a partition for each calendar month. Partitioning requires an in-depth knowledge of the values that are used or are valid for the chosen partitioning column.
Partitions may be created or dropped as needed and applications may choose to use range partitioning to manage data at a fine level of detail, for example, an application may create a partition for an upcoming month so that new data is inserte
Source: openHPI: In-Memory_Data_Management_2017
In some scenarios it makes sense to use colums which are not part of the primary key. But with single level partitioning you can only select primary key columns. Multi-level partitioning makes it possible to select on first level a key column and on second level any other column. It is also possible to mix different partitioning methods.
The following combinations are possible [firstlevel-secondlevel]:
4. Designing Partitions
For details please read the “Designing Partitions” section in the documentation.
- Use partitioning columns that are often used in WHERE clauses for partition pruning
- If you don’t know which partition scheme to use, start with hash partitioning
- Use as many columns in the hash partitioning as required for good load balancing, but try to use only those columns that are typically used in requests
- Queries do not necessarily become faster when smaller partitions are searched. Often queries make use of indexes and the table or partition size is not significant. If the search criterion is not selective though, partition size does matter.
- Using time-based partitioning often involves the use of hash-range partitioning with range on a date column
- If you split an index (SAP names the CS tables also as index), always use a multiple of the source parts (for example 2 to 4 partitions). This way the split will be executed in parallel mode and also does not require parts to be moved to a single server first.
- Do not split/merge a table unless necessary.
- Ideally tables have a time criterion in the primary key. This can then be used for time-based partitioning.
- Single level partitioning limitation: the limitation of only being able to use key columns as partitioning columns
5. Example BSEG
We determined that BSEG has 4,5 billion rows. Now we need details on which column the table has to be partitioned.
The recommendation of SAP is to use a HASH partitioning on BELNR in note 2044468.
Note 2289491 describes a range partitioning on BUKRS. As you can see it always depends on how your system is being used. Ok, let’s find it out in a test environment.
Now we know:
- table has to be partitioned (4,5 billion rows)
- most scanned columns
- recommendation for partitioning by SAP
|Questions to success:
To quote the HPI course on the partitioning details (In-Memory_Data_Management_2017):
“There are number of different optimization goals to be considered while choosing a suitable partitioning strategy. For instance, when optimizing for performance, it makes sense to have tuples of different tables, that are likely to be joined for further processing, on one server. This way the join can be done much faster due to optimal data locality, because there is no delay for transferring the data across the network. In contrast, for statistical queries like counts, tuples from one table should be distributed across as many nodes as possible in order to benefit from parallel processing.”
To sum up, the best partitioning strategy depends very much on the specific use case.
The main challenge for hash-based partitioning is to choose a good hash function, that implicitly achieves locality or access improvements.
Primary key BSEG:
MANDT BUKRS BELNR GJAHR BUZEI
The following columns have a particularly high amount of scanned records (SQL: “HANA_Tables_ColumnStore_Columns_Statistics”, MIN_SCANNED_RECORDS_PER_S = 5000000):
- To achieve this just fill in your table inside the modification section
There are recommendations for the most famous SAP standard tables within note 2044468. If you are close to the standard and don’t use a lot of custom code try this partitioning recommendations first. If you have heavy load with own statement which uses a totally different execution plan you may have to determine your own partitioning columns. Be aware of the partitioning limits!
If you want to check the optimal partitioning combinations use this statement:
select statement_hash, statement_string, execution_count, total_execution_time from m_sql_plan_cache where statement_string like '%BSEG%' order by total_execution_time desc;
From the result you have to analyze the “where” clause and find a common pattern.
Take as few fields as needed. Normally fields like MATNR, BELNR or DOCNR are selective fields and well suited for partitioning.
For a deep dive you can additionally run the DataCollector with the selected statement_hash (HANA_SQL_StatementHash_DataCollector) within the statement collection (1969700).
Additionally, you can use the ColumnStore statistics (HANA_Tables_ColumnStore_Columns_Statistics_2.00.030+ ) which is also part of the SQL collection to analyze how many scans are on which column. If you are using multiple clients, it may be wise to add the field MANDT as partitioning criteria.
SELECT TOP 100 'BSEG' TABLE_NAME, S.* FROM ( SELECT BUKRS, LPAD(COUNT(*), 11) OCCURRENCES, LPAD(TO_DECIMAL(COUNT(*) / SUM(COUNT(*)) OVER () * 100, 10, 2), 7) PERCENT FROM ( SELECT BUKRS FROM "SAPSCHEMA"."BSEG ) GROUP BY BUKRS ORDER BY OCCURRENCES DESC ) S
Only 2 values for BUKRS “0010” and “0020”. This means we have a too low cardinality and partitioning won’t work well on this attribute. So, this means both extreme too high and low are bad.
Most scanned columns are PCTR, HKONT and BELNR.
We have about 30 clients in this test system which means it makes sense to use it in the partitioning clause on the first level as entry point.
- MANDT first level hash partitioning
- BELNR second level hash partitioning
With 4,5 billion rows we need about 12-15 partitions which means 300 million rows per partition (rule of thumb: between 100 – 500 million), if we would use only single level partitioning. With multi-level partitioning the fill degree of the partitioning depends on the single column data cardinality of each level.
At the end it is just a indication for the given analyzed timeframe and no guarantee for the future. This is a process you should repeat if the application and selections are changing. This can be due user behaviour or new SAP packages or custom code.
6. Example: BSIS
MANDT BUKRS HKONT AUGDT AUGBL ZUONR GJAHR BELNR BUZEI
often executed statements on BSIS
- 4,1 billion rows
- About 30 client values
- Only one value for BUKRS
- The top occurrences in BELNR have summed up not 0,001%
The recommendation of SAP is a RANGE partitioning on BUKRS. If RANGE on BUKRS or a similar range partitioning is not possible, use HASH partitions on BELNR.
We can’t use BUKRS as partitioning object due too low cardinality. BELNR alone has a high cardinality and could be not efficient enough.
2 recommendation is in this scenario:
- MANDT first level range partitioning
- HKONT second level range partitioning
- MANDT first level hash partitioning
- BELNR second level hash partitioning
With 4,1 billion rows we need about 12-14 partitions which means 300 million rows per partition (rule of thumb: between 100 – 500 million), if we would use only single partitioning. With multilevel partitioning the fill degree of the partitioning depends on the single column data cardinality of each level.
7. Another example for Z-table
8. Parameter for splitting
As already mentioned in the first part of this series, we can optimize the splitting procedure with some parameters which have to be adjusted according your hardware:
indexserver.ini -> [partitioning] -> split_threads (default: 16) indexserver.ini -> [partitioning] -> bulk_load_threads (default: 4)
- Under a system with 176 CPU cores, the repartitioning was failed due to this error and completed in 9.18H with settings below.
indexserver.ini -> [partitioning] -> split_threads =50 indexserver.ini -> [partitioning] -> bulk_load_threads = 8
- With 120 CPU cores, it took 14.40H without the error after setting as below.
indexserver.ini -> [partitioning] -> split_threads =95 indexserver.ini -> [partitioning] -> bulk_load_threads = 10
HANA 2.0 SPS04 Rev. >= 43
indexserver.ini -> [mergedog] -> auto_merge_decision_func = '((DRC*TMD > 3600*(MRC+0.0001)) or ((DMS>PAL/2000 or DMS > 1000 or DCC>100) and (DRC > MRC/100 or INSTR(BASENAME, \'_SYS_OMR_\') > 0)) or (DMR>0.2*MRC and DMR > 0.001) or (DRC=0 and DMR=0 and MPU=1))' indexserver.ini -> [table_replication] -> optimize_replica_for_online_ddl_in_log_replay = 'true' indexserver.ini -> [table_replication] -> optimize_replica_for_online_ddl = 'true' indexserver.ini -> [persistence] -> logreplay_savepoint_interval_s = '600'
Duration of a partitioning
Export the table from production in binary format and import it in a separated system/schema to validate the duration of the partitioning process
tune runtime of the partitioning process
If a partitioning is running for a long time > 5h (best case 100GB/h , worst case 20GB/h) you can do a thread analysis and determine the column / constraint with the highest runtime. This can be an index or a concat attribute. It is possible that a drop and a recreate of this object, before you start the partitioning process, is faster than to process it during the partitioning. It is also possible that the PK is the root cause, but if you drop it, it is possible that you get duplicate entries. Please do this only if the application is stopped!
drop index <Schema>.<index_name>; alter table <Schema>.<table_name> drop constraint <constraint_name>; alter table <Schema>.<table_name> with parameters ('DELETE_CONCAT_ATTRIBUTE' = <constraint_name>); alter table <Schema>.<table_name> drop primary key; create index <index_name> on <Schema>.<table_name> (col1 asc, col2 asc); alter table <Schema>.<table_name> add constraint <constraint_name> foreign key (col1,col2,col3) <options>; alter table <Schema>.<table_name> with parameters ('CONCAT_ATTRIBUTE' = '<constraint_name>', 'col1', 'col2', 'col3') alter table <Schema>.<table_name> add constraint <constraint_name> primary key (col1,col2,col3);
Check the most used SQL statements (hashes) for their current runtime and execution plan. Check this performance also after the partitioning in a separated system if partitioning pruning takes place. This is important, because in other cases you can run into some trouble regarding a bad execution plan.
Bad execution plan
If a bad execution plan is used, check to recompile/remove the plan ID (view M_SQL_PLAN_CACHE) from the SQL plan cache. If you don’t have the possibility to verifiy each statement selectively clear the complete sql plan cache after the finished partitioning process
-- When invalidated, the entry is recompiled during the next execution time. Use this command when significant changes have occurred to the data cardinalities acted upon by the query plan. alter system recompile sql plan cache entry <plan_id>; -- clear remove the plan id which leads to a new plan id alter system remove sql plan cache entry <plan_id>; -- clear the whole cache ALTER SYSTEM CLEAR SQL PLAN CACHE;
There is no silver bullet for every customer. It depends always on the usage and the change of values in a table. Additionally the changes of the applications and the different selections. Try different partitioning columns in your test system and check the performance. You should also optimize the compression of those big tables from time to time. The optimize compression will take place automatically as part of a delta merge process, but decided from the system itself. On huge tables it can be that this happens pretty late which results in a bad performance. So, stay tuned and give your system reguarly health checks. SAP provides such services called TPO, but also some well known HANA experts provide such services 😉
V1.1 Multi-Level partitioning details, added 2874176
V1.2 dyn. range partitioning, 9. Tips