Skip to Content
Technical Articles

Test data generator for SAP HANA for large initial and delta data sets

Introduction

Recently I ran a scenario on the SAP HANA database for which I needed a large initial data set for a specific table, and another large delta data set consisting of deletions, updates and inserts on the initial data set. I googled something like “data generator for HANA” but didn’t find what I was looking for. Therefore, I wrote a database procedure to fill my table with the data that I needed. Because others might find the script useful too, I extended the script for easy re-use. The result you can find in this blog.

The procedure inserts, updates, or deletes data of a single table, using mostly set-based operations for high-performance. Inserts and updates are based on the field definition of the provided table and contain randomized data. For example, for an integer field, a random integer value is inserted, and for a character-based field, a random string of characters is inserted. The procedure uses the HANA metadata tables to retrieve the table definition. Record updates or deletions are randomized based on the first column of the primary key.

On my little test system, I achieved an insert rate of 9 million records per minute for a table with just a few fields. That was a lot faster than the first version of the script, which was not set-based.

How to run the data generator procedure

Follow below steps to generate data for your table

  1. Dowload the code in this Git repository and copy it into the SQL console
  2. Create a sequence with following statement: “CREATE SEQUENCE GENERATEDATASEQ1”. A sequence with that name is used in the procedure.
  3. Call the procedure using the following statement:
CALL "GenerateData" (
  IP_SCHEMA_NAME => '<SCHEMA_NAME>',
  IP_TABLE_NAME => '<TABLE_NAME>',
  IP_INSERT_NUM_RECORDS => <Number of records to be inserted>,
  IP_MAX_STRING_SIZE => <Maximum inserted string size. Strings are automatically limited on field length>,                
  IP_DELETE_NUM_RECORDS => <Number of records to delete>,
  IP_UPDATE_NUM_RECORDS => <Number of records to update>
);

Limitations

Keep in mind the following limitations, or adjust the code to work around them:

  • Only supports tables with a primary key. The primary key should contain at least one column that can hold integer values. Binary data types are not supported at all.
  • Created with HANA 2, SPS04, not guaranteed to work with older service packs.
  • The random generated values make compression difficult. Therefore, you will likely see minimal compression that is not to be compared with real-life use cases.

Example

As an example, a table is created with the following statement:

CREATE SCHEMA GENERATEDATA;
SET SCHEMA GENERATEDATA;
CREATE TABLE T1S (
  A INTEGER, 
  B NVARCHAR(5), 
  C DATE, 
  D TIMESTAMP, 
  PRIMARY KEY (A)
);

To insert two records into that table, the procedure is called as follows:

CALL "GenerateData"( 
  IP_SCHEMA_NAME => 'GENERATEDATA',                                                                
  IP_TABLE_NAME => 'T1S',                                                                
  IP_INSERT_NUM_RECORDS => 2,                                                                
  IP_MAX_STRING_SIZE => 3,                                                                
  IP_DELETE_NUM_RECORDS => 0,                                                                
  IP_UPDATE_NUM_RECORDS => 0
);

A select * on that table yields the following result:

Explanation of the code

In case you want to adjust the code or just understand what happens, an example table definition and procedure call is provided in the header section of the procedure. Throughout the code, output of (parts of) the dynamic SQL statements are shown, so it is easier to understand the, sometimes, long statements.

Conclusion

This blog provided you with a procedure to generate initial and delta data sets. Since the inserts are set-based, it should insert, update and delete at reasonable speeds. There are some limitations, and the code is not perfect, so if you got any improvements, feel free to share them in the comments.

/
5 Comments
You must be Logged on to comment or reply to a post.
  • Hi Stefan,

    Thanks for this really helpful post!

    I built a simple shell script which executes your procedure 20 times to create 20 tables with 50 million records each with a user I called GENDATA. With the below configuration I got 4.7GB per table in memory and on disk and each run took about 15 minutes (so it took roughly 3 minutes to generate 1GB).

    Maybe this is useful for someone else, so I leave it here:

    wrapper.sh

    n=1
    while [ $n -le 20 ]
    do
    echo “=====Run: $n=====”
    hdbsql -U GENDATA “DROP SEQUENCE GenerateDataSeq1;”
    hdbsql -U GENDATA “CREATE SEQUENCE GenerateDataSeq1;”
    hdbsql -U GENDATA “CREATE TABLE TABLE$n (A INTEGER, B NVARCHAR(20), C DATE, D TIMESTAMP, E NVARCHAR(20), F NVARCHAR(20), PRIMARY KEY (A) );”
    hdbsql -U GENDATA “CALL \”GenerateData\” (IP_SCHEMA_NAME => ‘GENDATA’, IP_TABLE_NAME => ‘TABLE$n’, IP_INSERT_NUM_RECORDS => 50000000, IP_MAX_STRING_SIZE => 20, IP_DELETE_NUM_RECORDS => 0, IP_UPDATE_NUM_RECORDS => 0);”
    n=$(( n+1 ))
    done

     

     

    • works, thanks a ton! To create a user GENDATA you must

      create user GENDATA password passW0rd

      Then hdbsql -U GENDATA will not allow for queries, but you do not need to use GENDATA to call the script, just its schema must exist. So after creating the user GENDATA, you can use hdbsql -u SYSTEM -p mypassword instead of hdbsql -U GENDATA.

       

  • Hi,

    nice work, I can see that this all will work, but the first sentence struck me: Paste it into the SQL Console. I do not have HANA Studio nor HANA Cockpit, where is the SQL Console?

    thanks

    Thorsten

    • found the answer, if you do not have an SQL Console, you can download the code from github as text file myfile.txt and run it in a sidadm shell like this:

      hdbsql -u SYSTEM -p mypassword -I myfile.txt

       

      Great, I created my first stored procedure 🙂

  • Works for me, thanks a lot!

    Users who do not have graphical tools can use hdbsql. To query the table, use

    hdbsql -u SYSTEM -p mypassword “select * from GENERATEDATA.T1S”|cat