Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Alex_Hauck
Advisor
Advisor

Today I want to share various variants of generating mock-data on SAP HANA with you. For developing, testing and demonstrating an application it is necessary to fill the connected backend with content. Generating a big amount of data manually can be very time intensive - for saving your time (and nerves 😉 ), this blog post deals with automated data-generation in the SAP HANA environment.

Introduction

Before explaining the first procedure to generate data, this section is supposed to give you an overview of how this blog is structured. First of all you have to decide what kind of application you are going to develop. Is it an application that presents data in a static way, that means data isn´t changing often? Or are you going to develop an application in which you realize a lot of changes concerning your stored information?

The following chapter presents various approaches and classifies those applications depending on their "data-changing-frequency".

Following this, the different methods will be explained. First I am going to show you how to create mass data for a HANA database with an external tool (Java application). After that I will go on with two other variants of creating data in a HANA XS environment (for native application development).  A final chapter presents my own opinion and experiences with those differents ways of generating data.

Static vs. Dynamic Apps

Now, let´s start classifying your application. The static application deals with data that is not changed that often (for instance, archives, monthly generated data e.g. quarterly results etc.). A dynamic one can be characterized as an application that handles information which are changed often. On the one hand the user wants to see the most up-to-date data and on the other hand it could be changed several times by other (external) events the user cannot influence. As an example for such an dynamic, fast moving application I´d like to mention an Internet-of-Things application. Interacting with "things" or sensors and deriving conclusions out of this delivered data can be very valuable. Especially a demonstration of such an application without having real, phyiscal sensors / "things" is not easy.

Under circumstances, sensors could transmit a lot of different values in a very short time period. To simulate this with mocked-data you have to consider some facts that differ from mocking data for a static application. I am going to come back to this later in the corresponding section.

Static: Generating mass data with the Java assistant

Requirements:Java required (JDBC connection), DB User with writing rights
Benefits:Mass data generarion, test of HANA processing performance, several generation modes
Downsides:Not automated, not integrated into SAP XS environment

Realizing applications on SAP HANA have the advantage of processing mass data in a very efficient way. Handling big amount of data gets done with implementing business logic directly into the database stack. To verify that processing mass data gets done in an appropriate manner you need mocked data for simulation purposes. Generating every single data record in manually way won´t be very efficient. Therefore I´d like to present you a dedicated Java application that is going to help you getting this job done nearly automatically.

The following guide leads you through the whole process. The pictures and also some terms presented below are taken from a document created by Erik Lemen. For comparison reasons with other approchaes of data generation I´d like to summarize parts of this guide. More detailed information about this process can be found in Erik´s guide.

1. After downloading the Java application via this link, a guided procedure leads you through the whole process.

2. Connect the Java application to the host your HANA instance is running on.

Remark: Please consider that your user you are using for setting up the connection to a HANA instance has sufficient rights for inserting data into the database.

3. To find the corresponding database table for inserting randomized mock data, the guide offers you a overview of the whole catalog. This gives you the ability to drill trough the various schemas to the table you are lokking for.

4. Selecting the table you are willing to insert data, is the next step you have to proceed. Based on the table you have selected the java application prompts you to select the columns in which data has to be written down. Of course all key columns and moreover additional columns which are not allowed beeing "nulled" are selected in advance. The configuration of the mode that is used for creating randomized data completes the setup.

The figure above illustrates the different data generation modes (1 to 5). For columns with the data type CHARACTER, two different modes are available. You can choose between the Auto-Generated mode #4 and the Custom mode #5. First mentioned one generates data with the well known "Lorem ipsum..." pattern. The second mode allows you to enter a value which will be written for every data record.

Content for date- and numeric-based columns can be created with three various modes. As marked with #1, #2 the Random and the Sequential mode generates values between a lower and an upper value. #3, the Fixed mode, only needs one value which will create nearly similar to #5 a fixed, predefined value for the corresponding column for every data record.

The configuration of how many records the tool should create finalizes the generation process. In this last step one more thing has to be considered. Three checkboxes allow additional control for creating a big amount of data.

  • Truncate table - if checked, the selected table gets cleared before inserting new data records.
  • Commit each batch - the randomized data gets split up into blocks (size of blocks can be sized as well). This leads to a commit execution after inserting every block (prevents data loss in case of failure).
  • Display progressbar - if checked, this function shows the progress for every batch.

Next week I will show you how to generate data in a HANA XS environment using the oData Explorer, so you don´t have to be dependent on an external toolset.

Have you ever faced the challenge of generating random sample data? If so, let me know how you approached!

Stay tuned 😉



1 Comment