Partitioning Data Volumes for HANA DB performance improvement
Partitioning Data Volumes
Below is a simple question and answer format to understand the usage of data volume partitioning ,how it helps in improving over all HANA DB read and write performance and how this is different from data volume striping
1.What is data volume partitioning? How does it add performance advantage over default setup? Since when it is available?
Data volumes on the Indexserver can be partitioned so that read and write operations can run in parallel with increased data throughput. Also, the HANA startup time will benefit from this feature because the data throughput will increase massively, depending on the number and performance of the additional disks or volumes
The data files can now be segmented and stored in different locations and can then be accessed in parallel threads.
This feature is available since SAP HANA 2.0 SPS 03 .
2.How does SAP HANA data volume partitioning takes advantage of NFS filesystem type usage in HANA?
In the case of Network File Systems data can also be (along with parallel read) written in parallel across multiple connections. Partitioning data volumes in this way will therefore speed up all read/write operations on the volume including savepoints, merges, restart, table loading operations, and backups.
This feature enables the filesystem to have more parallel channels for processing I/O. To truly benefit from this feature additional mountpoints need to be configured for the additional locations of data volume partitions in order to leverage additional TCP connections in case of NFS. No further network configuration is required as long as the network infrastructure can sustain the additional workload
3.Can non-NFS type filesystems used in HANA takes benefit of data volume partitioning?
For non-NFS type filesystems the benefits of adopting the feature depends on setup provided by the hardware vendor / TDI design. Hence this has to be discussed with the vendor for feasibility as they responsible for designing the storage layout.
4.Can we use partitioning for log volumes in HANA?
NO,HANA data partitioning is only available for data volumes not for log volumes
5.When does data gets written to newly partitioned DATA volume in HANA ?
For a newly added data volume partition on an existing system, data is not immediately distributed . Fresh I/O writes are distributed to the new data volume partition and eventually the database achieves even distribution from a size point of view.
However,If immediate even distribution of data is required, we have to consider using SAP HANA backup and recovery (only file and backint based backup and recovery)
6.How to perform data volume partitioning?
Starting with SAP HANA 2.0 SPS 03 explicit commands exist to adjust the number of volume partitions respectively data files:
ALTER SYSTEM ALTER DATAVOLUME ADD PARTITION
ALTER SYSTEM ALTER DATAVOLUME DROP PARTITION <id>
Starting with SAP HANA 2.0 SPS 04 you can optionally specify a system replication site ID
ALTER SYSTEM ALTER DATAVOLUME ADD PARTITION SYSTEM REPLICATION SITE <site_id>
ALTER SYSTEM ALTER DATAVOLUME DROP PARTITION <id> SYSTEM REPLICATION SITE <site_id>
Refer below link for end to end steps starting from OS level steps to HANA DB steps
7.What happens when we create a new partition on the indexserver ?
When we create a new partition on the index server, it is added simultaneously to all indexservers in the topology. New partitions become active after the next savepoint on each indexserver, this is shown in the partition STATE value which changes from ACTIVATING to ACTIVE. By default all data volumes have a single partition with the numeric ID zero. A numeric partition ID is assigned automatically to new partitions by the HANA persistency layer. If the partition numbering is for any reason inconsistent across all indexservers then any attempt to add new partitions will fail.
8.What are the ways to monitor hana data volume partition ?
Way1: Starting with SAP HANA 2.0 SPS 04 you can use SQL: “HANA_Disks_Data_Partitions” (SAP Note 1969700) to display an overview of existing data volume partitions.
Way2: Using monitoring views :We can see the current data volume configuration from the following two views:
- M_DATA_VOLUME_STATISTICS: This provides statistics for the data volume partitions on the indexserver including the number of partitions and size.
- M_DATA_VOLUME_PARTITION_STATISTICS: This view provides statistics for the individual partitions, identified by PARTITION_ID, and includes the partition STATE value.
In a replication scenario we can monitor the M_DATA_VOLUME_PARTITION_STATISTICS view on the secondary via proxy schema SYS_SR_SITE<siteName> (where <siteName> is the name of the secondary site).
9.What are the performance impact on dropping an data volume partition from HANA DB?
Dropping a SAP HANA data volume partition involves reading data from the dropped partition and writing it into existing partitions. Since this could involve significant I/O activity depending on the quantity of reads/writes, such an activity should be performed during low business workload timeframes.
10.What happens when we drop a partition using sql DROP PARTITION?
This command drops the identified partition from all indexservers in the topology. The default partition with ID zero cannot be dropped. If we drop a partition, then all data stored in the partition is automatically moved to the remaining partitions and for this reason dropping a partition may take time. This operation also removes the partition entry from the configuration file.
11.Can we drop an active data volume partition with a HANA DB with HSR enabled ?
No. In a running system replication setup, we may not be able to drop an active data volume partition as system replication uses data volume snapshot technology. We will see the error “Cannot move page inside/out of DataVolume”. In this case it may be necessary to disable system replication functionality, drop the partition, and then setup system replication again.
12.Can we add HANA data volume partitions to the path of our own choice ? When do we use user defined path? How to configure / enable it?
Despite the default data volume base path of /usr/sap/<SID>/SYS/global/hdb/data, we can also add partitions in a specified path of our own choice (Using SQL syntax ADD PARTITION PATH.Refer Question 6). The path must be reachable by all nodes or services in the topology. Beneath the specified path the standard folder structure is created automatically with a numbered folder for each host. A user-defined path is required, for example, if we are using multiple NFS connections so that data can be written in parallel to different paths. This option must be explicitly enabled by setting the PERSISTENCE_DATAVOLUME_PARTITION_MULTIPATH parameter in the customizable_functionalities section of global.ini to TRUE. The partition basepath is saved in the indexserver.ini configuration file in the basepath_datavolume key of the partition ID section.
13.How is data volume partitioning and data volume striping different?
- SAP HANA data volume partitioning distributes fresh incoming pages across data volume partitions to achieve parallelization of I/O operations wherever possible.
- SAP HANA data volume striping provides the possibility to limit the size of the existing data volume files and create a new data volume file and redirect incoming pages to the new file if no space exists in the older file. There is no even distribution of I/O writes as achieved with data volume partitioning.
14.What is data volume striping? How to configure data volume striping ?
In order to prevent HANA trying to grow data files beyond the certain file size limit ,we need to set the following parameters in the global.ini configuration file of HANA. With this HANA creates a new data volume file and redirect incoming pages to the new file if no space exists in the older file or the threshold is met
global.ini -> [persistence] -> datavolume_striping = true
global.ini -> [persistence] -> datavolume_striping_size_gb = <max_file_size_gb>
15.How does Azure HANA large instance(HLI) make use of data striping feature?Why is it mandatory to use data striping in Azure HLI?
The storage used in HANA Large Instances has a file size limitation of 16 TB per file.Unlike in file size limitations in the EXT3 file systems, HANA is not aware implicitly of the storage limitation enforced by the HANA Large Instances storage. As a result HANA will not automatically create a new data file when the file size limit of 16 TB is reached. As HANA attempts to grow the file beyond 16 TB, HANA will report errors and the index server will crash at the end.
In order to prevent HANA trying to grow data files beyond the 16 TB file size limit of we have to set the following parameters in the global.ini configuration file of HANA
- datavolume_striping_size_gb = 15000
16.How does EXT* and XFS file systems overcome the problem of excessively large file sizes in relation to data volumes?
For EXT* and XFS file systems the problem of excessively large file sizes in relation to large data volumes is overcome by striping. These file systems have a maximum size of 2TB for each physical file and the persistency layer automatically chooses appropriate striping sizes for these file systems. We can define a maximum file size for stripes using the striping configuration parameters in the persistence section of the indexserver.ini file: datavolume_striping (TRUE / FALSE), and datavolume_striping_size_gb (a value between 100 and 2000GB).
Hope it was helpful!!
Click on like if you found this article useful and FOLLOW for more such articles.Rajarajeswari Kaliyaperuumal
Please leave a comment or suggestion!