Recently I ran a scenario on the SAP HANA database for which I needed a large initial data set for a specific table, and another large delta data set consisting of deletions, updates and inserts on the initial data set. I googled something like “data generator for HANA” but didn’t find what I was looking for. Therefore, I wrote a database procedure to fill my table with the data that I needed. Because others might find the script useful too, I extended the script for easy re-use. The result you can find in this blog.
The procedure inserts, updates, or deletes data of a single table, using mostly set-based operations for high-performance. Inserts and updates are based on the field definition of the provided table and contain randomized data. For example, for an integer field, a random integer value is inserted, and for a character-based field, a random string of characters is inserted. The procedure uses the HANA metadata tables to retrieve the table definition. Record updates or deletions are randomized based on the first column of the primary key.
On my little test system, I achieved an insert rate of 9 million records per minute for a table with just a few fields. That was a lot faster than the first version of the script, which was not set-based.
How to run the data generator procedure
Follow below steps to generate data for your table
- Dowload the code in this Git repository and copy it into the SQL console
- Create a sequence with following statement: “CREATE SEQUENCE GENERATEDATASEQ1”. A sequence with that name is used in the procedure.
- Call the procedure using the following statement:
CALL "GenerateData" ( IP_SCHEMA_NAME => '<SCHEMA_NAME>', IP_TABLE_NAME => '<TABLE_NAME>', IP_INSERT_NUM_RECORDS => <Number of records to be inserted>, IP_MAX_STRING_SIZE => <Maximum inserted string size. Strings are automatically limited on field length>, IP_DELETE_NUM_RECORDS => <Number of records to delete>, IP_UPDATE_NUM_RECORDS => <Number of records to update> );
Keep in mind the following limitations, or adjust the code to work around them:
- Only supports tables with a primary key. The primary key should contain at least one column that can hold integer values. Binary data types are not supported at all.
- Created with HANA 2, SPS04, not guaranteed to work with older service packs.
- The random generated values make compression difficult. Therefore, you will likely see minimal compression that is not to be compared with real-life use cases.
As an example, a table is created with the following statement:
CREATE SCHEMA GENERATEDATA; SET SCHEMA GENERATEDATA; CREATE TABLE T1S ( A INTEGER, B NVARCHAR(5), C DATE, D TIMESTAMP, PRIMARY KEY (A) );
To insert two records into that table, the procedure is called as follows:
CALL "GenerateData"( IP_SCHEMA_NAME => 'GENERATEDATA', IP_TABLE_NAME => 'T1S', IP_INSERT_NUM_RECORDS => 2, IP_MAX_STRING_SIZE => 3, IP_DELETE_NUM_RECORDS => 0, IP_UPDATE_NUM_RECORDS => 0 );
A select * on that table yields the following result:
Explanation of the code
In case you want to adjust the code or just understand what happens, an example table definition and procedure call is provided in the header section of the procedure. Throughout the code, output of (parts of) the dynamic SQL statements are shown, so it is easier to understand the, sometimes, long statements.
This blog provided you with a procedure to generate initial and delta data sets. Since the inserts are set-based, it should insert, update and delete at reasonable speeds. There are some limitations, and the code is not perfect, so if you got any improvements, feel free to share them in the comments.