Skip to Content

Thanks to the initiatives aimed towards the SAP HANA developer community such as the SAP HANA One Developer Edition and the Hosted Sandbox, more and more people are getting their hands on a SAP HANA environment. One of the first questions that gets raised is, how do I get some sample data to play around with? Whilst there are a number of options including flat file uploads and the sFlight and eFashion schemas, I’d like to offer an additional approach which is through Stored Procedures and SQL. Through this method, you can simple copy and paste the code below and it will generate a number of sample dimension tables and a fact table. You can stick with the default numbers included in the code below, or modify the values in the calls to the Stored Procedures to set your own sizes. The code is fully commented and it should be fairly straight forward to tweak if need be.

A couple of points to note. I posted this code initially as a response to a question in the forums on how to generate big data. Lars Breddemann highlighted that the use of the RAND() function is only available from Revision 46 onwards. So the code below will not work for earlier revisions. Furthermore, this approach provides random values and will not create real data distributions that can be used for meaningful analysis. I’m looking into some approaches to force the data to be skewed towards more realistic distributions. I’ll update this blog once complete. Any suggestions on this would be appreciated in the comments. Also, thanks to Lars for the code to allow the progress to be monitored. This is very handy and something I’ll use elsewhere. 🙂

— ==============================================================================

— Cleanup

— ==============================================================================

—          Execute the following statements only if you are re-running the process

—          to generate sample data. If this is the first time you are running the process

—          you can igore these 7 statements.

DROP TABLE SALES_F;

DROP TABLE BUSINESS_UNIT_D;

DROP TABLE SUPPLIER_D;

DROP TABLE MATERIAL_D;

DROP PROCEDURE BUILD_SUPPLIER_TABLE;

DROP PROCEDURE BUILD_MATERIAL_TABLE;

DROP PROCEDURE BUILD_FACT_TABLE;

— ==============================================================================

— Create our Dimension and Fact Tables

— ==============================================================================

CREATE COLUMN TABLE “SALES_F” (“SALES_ORDER_NBR” BIGINT CS_FIXED NOT NULL ,

       “CALENDAR_DAY” DAYDATE CS_DAYDATE,

       “BUSINESS_UNIT_ID” BIGINT CS_FIXED,

       “MATERIAL_ID” BIGINT CS_FIXED,

       “SUPPLIER_ID” BIGINT CS_FIXED,

       “UNIT_PRICE” DOUBLE CS_DOUBLE,

       “QUANTITY_SOLD” DOUBLE CS_DOUBLE,

       PRIMARY KEY (“SALES_ORDER_NBR”));

CREATE COLUMN TABLE “BUSINESS_UNIT_D” (“BUSINESS_UNIT_ID” BIGINT CS_FIXED NOT NULL ,

       “BUSINESS_UNIT_CODE” NVARCHAR(5),

       “BUSINESS_UNIT_DESC” NVARCHAR(256),

       “PARENT_BUSINESS_UNIT_ID” BIGINT CS_FIXED,

       “PARENT_BUSINESS_UNIT_CODE” NVARCHAR(5),

       PRIMARY KEY (“BUSINESS_UNIT_ID”));

CREATE COLUMN TABLE “SUPPLIER_D” (“SUPPLIER_ID” BIGINT CS_FIXED,

       “SUPPLIER_DESC” VARCHAR(60),

       PRIMARY KEY(“SUPPLIER_ID”));

CREATE COLUMN TABLE “MATERIAL_D” (“MATERIAL_ID” BIGINT CS_FIXED,

       “SKU” VARCHAR(16),

       “MATERIAL_GROUP” VARCHAR(60),

       PRIMARY KEY(“MATERIAL_ID”));

— ==============================================================================

— Create our temporary tables

— ==============================================================================

— Temporary Tables for the Supplier Dimension

—          These tables are used in the Supplier Stored Procedure

CREATE COLUMN TABLE ADJECTIVE (ID INTEGER, WORD VARCHAR(60), PRIMARY KEY (“ID”));

CREATE COLUMN TABLE NOUN (ID INTEGER, WORD VARCHAR(60), PRIMARY KEY (“ID”));

CREATE COLUMN TABLE SUP_TYPE (ID INTEGER, WORD VARCHAR(60), PRIMARY KEY (“ID”));

— Temporary Table for the Material Table

—          This table is used to assign a Material Group to the Material Table

CREATE COLUMN TABLE MAT_GROUP (ID INTEGER, WORD VARCHAR(60), PRIMARY KEY (“ID”));

— ==============================================================================

— Populate Our Business Unit Table

— ==============================================================================

INSERT INTO “BUSINESS_UNIT_D”

VALUES(1,‘BU1’,‘Business Unit 1’,0,);

INSERT INTO “BUSINESS_UNIT_D”

VALUES(2,‘BU2’,‘Business Unit 2’,1,‘BU1’);

INSERT INTO “BUSINESS_UNIT_D”

VALUES(3,‘BU3’,‘Business Unit 3’,1,‘BU1’);

INSERT INTO “BUSINESS_UNIT_D”

VALUES(4,‘BU4’,‘Business Unit 4’,2,‘BU2’);

INSERT INTO “BUSINESS_UNIT_D”

VALUES(5,‘BU5’,‘Business Unit 5’,3,‘BU3’);

INSERT INTO “BUSINESS_UNIT_D”

VALUES(6,‘BU6’,‘Business Unit 6’,3,‘BU4’);

INSERT INTO “BUSINESS_UNIT_D”

VALUES(7,‘BU7’,‘Business Unit 7’,4,‘BU4’);

INSERT INTO “BUSINESS_UNIT_D”

VALUES(8,‘BU8’,‘Business Unit 6’,4,‘BU4’);

— ==============================================================================

— Populate Our Lookup Tables used to generate dummy names

— ==============================================================================

— Populate Adjective Temporary Table

INSERT INTO ADJECTIVE VALUES(1, ‘Great’);

INSERT INTO ADJECTIVE VALUES(2, ‘Modern’);

INSERT INTO ADJECTIVE VALUES(3, ‘Fast’);

INSERT INTO ADJECTIVE VALUES(4, ‘Proud’);

INSERT INTO ADJECTIVE VALUES(5, ‘Solid’);

INSERT INTO ADJECTIVE VALUES(6, ‘Broad’);

INSERT INTO ADJECTIVE VALUES(7, ‘Elegant’);

INSERT INTO ADJECTIVE VALUES(8, ‘Fancy’);

INSERT INTO ADJECTIVE VALUES(9, ‘Mysterious’);

INSERT INTO ADJECTIVE VALUES(10, ‘Fantastic’);

— Populate Noun Temporary Table

INSERT INTO NOUN VALUES(1, ‘Factory’);

INSERT INTO NOUN VALUES(2, ‘Offices’);

INSERT INTO NOUN VALUES(3, ‘Industry’);

INSERT INTO NOUN VALUES(4, ‘Station’);

INSERT INTO NOUN VALUES(5, ‘Restaurant’);

INSERT INTO NOUN VALUES(6, ‘Buildings’);

INSERT INTO NOUN VALUES(7, ‘Mall’);

INSERT INTO NOUN VALUES(8, ‘Studio’);

INSERT INTO NOUN VALUES(9, ‘Stockbrokers’);

INSERT INTO NOUN VALUES(10, ‘Academy’);

— Populate Supplier Type Temporary Table

INSERT INTO SUP_TYPE VALUES(1, ‘Limited’);

INSERT INTO SUP_TYPE VALUES(2, ‘Pty Ltd’);

INSERT INTO SUP_TYPE VALUES(3, ‘Partnership’);

INSERT INTO SUP_TYPE VALUES(4, ‘Group’);

INSERT INTO SUP_TYPE VALUES(5, ‘Trust’);

INSERT INTO SUP_TYPE VALUES(6, ‘Collective’);

INSERT INTO SUP_TYPE VALUES(7, ‘Consortium’);

INSERT INTO SUP_TYPE VALUES(8, ‘Inc.’);

INSERT INTO SUP_TYPE VALUES(9, ‘Traders’);

INSERT INTO SUP_TYPE VALUES(10, ‘Franchise’);

— Populated Material Group Temporary Table

INSERT INTO MAT_GROUP VALUES(1, ‘Engine’);

INSERT INTO MAT_GROUP VALUES(2, ‘Exterior’);

INSERT INTO MAT_GROUP VALUES(3, ‘Interior’);

INSERT INTO MAT_GROUP VALUES(4, ‘Accesories’);

INSERT INTO MAT_GROUP VALUES(5, ‘Electrical’);

INSERT INTO MAT_GROUP VALUES(6, ‘Components’);

INSERT INTO MAT_GROUP VALUES(7, ‘Finishing’);

INSERT INTO MAT_GROUP VALUES(8, ‘Hydraulics’);

INSERT INTO MAT_GROUP VALUES(9, ‘Liquids’);

INSERT INTO MAT_GROUP VALUES(10, ‘Extras’);

— ==============================================================================

— Create the Stored Procedure to build the Supplier Table

— ==============================================================================

—          This procedure will generate a dummy supplier name from the temporary tables and

—          allows an integer argument to be used to set the number of rows to be generated.

CREATE PROCEDURE BUILD_SUPPLIER_TABLE (IN NMBR INT) LANGUAGE SQLSCRIPT AS

CNTR INTEGER;

BEGIN

CNTR := 0;

WHILE CNTR < :NMBR DO

INSERT INTO SUPPLIER_D

SELECT :CNTR,

            (SELECT TOP 1 WORD FROM ADJECTIVE WHERE ID = SUBSTR(ROUND(RAND() * 9, 0 ),1,1) + 1 ORDER BY WORD)  || ‘ ‘ ||

            (SELECT TOP 1 WORD FROM NOUN WHERE ID = SUBSTR(ROUND(RAND() * 9, 0 ),1,1) + 1 ORDER BY WORD) ||  ‘ ‘ ||

            (SELECT TOP 1 WORD FROM SUP_TYPE WHERE ID = SUBSTR(ROUND(RAND() * 9, 0 ),1,1) + 1 ORDER BY WORD)  AS SUPDESC

FROM DUMMY;     

CNTR := CNTR + 1;

END WHILE;

END;

— ==============================================================================

— Call the Supplier Table Stored Procedure

— ==============================================================================

—          To build the supplier table, use the following command. The number in brackets determines

—          the number of rows that will be populated.

CALL BUILD_SUPPLIER_TABLE(1000);

— ==============================================================================

— Create the Stored Procedure to build the Material Table

— ==============================================================================

—          This procedure will generate the material table. A random SKU number

—          is generated. the procedure will also assign a random material group

—          to each generated SKU.

CREATE PROCEDURE BUILD_MATERIAL_TABLE (IN NMBR INT) LANGUAGE SQLSCRIPT AS

CNTR INTEGER;

BEGIN

CNTR := 0;

WHILE CNTR < :NMBR DO

INSERT INTO MATERIAL_D

SELECT :CNTR,

       ‘SKU’ || LPAD(ROUND((RAND() * 1000000),0),7,‘0000000’) as SKU,

            (SELECT TOP 1 WORD FROM MAT_GROUP WHERE ID = SUBSTR(ROUND(RAND() * 9, 0 ),1,1) + 1 ORDER BY WORD)  AS MATERIAL

FROM DUMMY;     

CNTR := CNTR + 1;

END WHILE;

END;

— ==============================================================================

— Call the Material Table Stored Procedure

— ==============================================================================

—          To build the material table, use the following command. The number in brackets determines

—          the number of rows that will be populated. 

CALL BUILD_MATERIAL_TABLE(10000);

— ==============================================================================

— Create the Stored Procedure to build the Fact Table

— ==============================================================================

—          This stored procedure builds the Fact table. It creates a random date based

—          on a start date of 01 Jan 2011 with a random number of days add between 0

—          and 730. These values can be changed.

—          The remaining dimension key values are random integers based on the number

—          of rows in each dimension table.

CREATE PROCEDURE BUILD_FACT_TABLE (IN NMBR INT) LANGUAGE SQLSCRIPT AS

CNTR INTEGER;

BEGIN

CNTR := 0;

WHILE CNTR < :NMBR DO

INSERT INTO SALES_F

SELECT :CNTR,

       ADD_DAYS (TO_DATE (‘2011-01-01’, ‘YYYY-MM-DD’), RAND() * 730),

         ROUND((RAND() * (SELECT COUNT(*) FROM BUSINESS_UNIT_D)), 0 ),

         ROUND((RAND() * (SELECT COUNT(*) FROM MATERIAL_D)), 0 ),

         ROUND((RAND() * (SELECT COUNT(*) FROM SUPPLIER_D)), 0 ),

         ROUND(RAND() * 1000,2),

         ROUND(RAND() * 100,0)

FROM DUMMY;

CNTR := CNTR + 1;

— Monitoring via m_session_context (Credit to Lars Breddemann)

      IF MOD(:CNTR, 1000) = 0 THEN

            EXEC ‘set session ”BUILD_FACT_TABLE_PROGRESS” = ”’||:CNTR||‘ of ‘|| :NMBR || ”’ ‘;

      END IF;

— Monitoring via m_session_context     

END WHILE;

END;

— ==============================================================================

— Call the Fact Table Stored Procedure

— ==============================================================================

—          To build the material table, use the following command. The number in brackets determines

—          the number of rows that will be populated. 

CALL BUILD_FACT_TABLE(100000);

— ==============================================================================

— Monitor Progress

— ==============================================================================

—          The building of the fact table can take a considerable amount of time. The progress

—          can be monitored by issuing the following SQL statement. Credit goes to Lars Breddemann

—          for this functionality.

SELECT *

FROM “PUBLIC”.“M_SESSION_CONTEXT”

WHERE KEY = ‘BUILD_FACT_TABLE_PROGRESS’;

— ==============================================================================

— Cleanup

— ==============================================================================

—          Now that our dimension and fact tables have been created, we can clean up

—          the temporary tables that were used in generating our sample data.

DROP TABLE ADJECTIVE;

DROP TABLE NOUN;

DROP TABLE SUP_TYPE;

DROP TABLE MAT_GROUP;

— ==============================================================================

— Verify the tables have been created and number of rows created.

— ==============================================================================

SELECT ‘BUSINESS_UNIT_D’ AS TABLE_NAME,

COUNT(*)

FROM BUSINESS_UNIT_D

UNION

SELECT ‘MATERIAL_D’ AS TABLE_NAME,

COUNT(*)

FROM MATERIAL_D

UNION

SELECT ‘SUPPLIER_D’ AS TABLE_NAME,

COUNT(*)

FROM SUPPLIER_D

UNION

SELECT ‘SALES_F’ AS TABLE_NAME,

COUNT(*)

FROM SALES_F


To report this post you need to login first.

15 Comments

You must be Logged on to comment or reply to a post.

  1. Rich Heilman

    Hi Gary.  Very nice data generator!  I just wanted to let you and your readers know that we are planning on shipping an example data model called EPM(Enterprise Procurement Model) in an upcoming revision.  This will include a set of tables like “Business Partners”, “Addresses”, “Products”, “Purchase Orders, and “Sales Orders”.  These tables will be delivered with data, but we will also supply a data generator application which runs on XS which can be used to generate as much data as you want in the Purchase Order and Sales Order tables.  Finally, there will be an example XS application which leverages the EPM dataset and shows how you can build a very nice XS app.  We plan to use this new EPM dataset for all future materials, including workshops, code examples, tutorials, etc.  This way all the developers have a common data set to learn from going forward.

    Cheers,

    Rich Heilman

    (1) 
    1. Gary Elliott Post author

      Hi Rich,

      That’s excellent news and something that will really help out the developer community. I look forward to seeing it in a future release. Particulary the XS App. Sounds good.

      Cheers,

      Gary

      (0) 
  2. Mark Walker

    Hi Gary

    I made a couple of adjustments to your code – hope you don’t mind 😉

    They let me build the fact table over several calls, with a delta merge in the middle to save memory.  Here’s the incriminating code:

    CREATE PROCEDURE BUILD_FACT_TABLE (IN NMBR INT) LANGUAGE SQLSCRIPT AS

    CNTR INTEGER;

    nummax integer;

    BEGIN

    CNTR := 0;

    select count(*) into cntr from sales_f;

    Nummax := :NMBR + :CNTR;

    WHILE CNTR < :nummax DO

    and in the call:

    CALL BUILD_FACT_TABLE(10000);

    merge delta of sales_f;

    CALL BUILD_FACT_TABLE(10000);

    merge delta of sales_f;

    Cheers

    Mark

    (0) 
    1. Hannes Mayer

      I have further optimized the proc to be able to run in parallel and execute way faster because only once selecting the dimensions members

      drop PROCEDURE BUILD_FACT_TABLE_FAST;

      CREATE PROCEDURE BUILD_FACT_TABLE_FAST (IN STARTNUM INT, IN ENDNUM INT) LANGUAGE SQLSCRIPT AS

      CNTR INTEGER;

      CNTRB INTEGER;

      CNTRM INTEGER;

      CNTRS INTEGER;

      BEGIN

      CNTR := :STARTNUM;

      SELECT COUNT(*) into CNTRB FROM BUSINESS_UNIT_D;

      SELECT COUNT(*) into CNTRM FROM MATERIAL_D;

      SELECT COUNT(*) into CNTRS FROM SUPPLIER_D;

      WHILE CNTR < :ENDNUM DO

      INSERT INTO SALES_F

      SELECT :CNTR,

             ADD_DAYS (TO_DATE (‘2011-01-01’, ‘YYYY-MM-DD’), RAND() * 730),

               ROUND((RAND() * CNTRB), 0 ),

               ROUND((RAND() * CNTRM), 0 ),

               ROUND((RAND() * CNTRS), 0 ),

               ROUND(RAND() * 1000,2),

               ROUND(RAND() * 100,0)

      FROM DUMMY;

      CNTR := CNTR + 1;

      — Monitoring via m_session_context (Credit to Lars Breddemann)

            IF MOD(:CNTR, 1000) = 0 THEN

                  EXEC ‘set session ”BUILD_FACT_TABLE_PROGRESS” = ”’|| (:CNTR-:STARTNUM) ||’ of ‘|| (:ENDNUM-:STARTNUM) ||”’ ‘;

            END IF;

      — Monitoring via m_session_context    

      END WHILE;

      END;

      (0) 
  3. Orel Stringa

    Hi Gary,

    this is a great help to generate data.

    One odd question though: I see you use the RAND function. I wonder if RAND once in while may generate the same value and if that happened to be part of or the whole key then we would have a duplicate record. Would perhaps using a sequence be a duprec-safe way of generating data?

    Thanks,

    Orel

    (0) 
    1. Hannes Mayer

      Its a DUMMY Table to have something in the FROM Claus – to build a valid SQL – to just select one row of constants or variables.

      SQL Server:

      select 1 ;

      ORACLE

      select 1 from dual;

      HANA

      select 1 from DUMMY;

      (0) 
  4. Dusan Smelcer

    Hi, is it normal that the procedure create lot of null data? E.g. for Material table, I have lot null values for field MATERIAL_GROUP.

    I found that the script in procedure “… SELECT TOP 1 WORD FROM MAT_GROUP WHERE ID = SUBSTR(ROUND(RAND() * 9, 0 ),1,1) + 1 ORDER BY WORD ..” sometimes randomly returns null(). Do you know how to workaround this?

    (0) 
  5. Project Aurora

    This is a great blog if you want to generate a lot of data fast.

    I used below procedure in addition to above script from Gary to generate more than a billion records in just over 1 hour.

    — ==============================================================================

    — Create the Stored Procedure to duplicate records in the Fact Table

    — ==============================================================================

    —          This stored procedure duplicates the existing records. It creates a random date based

    —          on a start date of 01 Jan 2013 with a random number of days add between 0

    —          and 730. These values can be changed.

    —          The remaining dimension key values are copied from the existing records

    CREATE PROCEDURE BUILD_FACT_TABLE_DUP (IN START_NMBR INT)

    LANGUAGE SQLSCRIPT AS

    CNTR INTEGER;

    BEGIN

    CNTR := 0 ;

    INSERT INTO SALES_F

    SELECT SALES_ORDER_NBR + START_NMBR,

           ADD_DAYS (TO_DATE (‘2013-01-01’, ‘YYYY-MM-DD’), RAND() * 730),

           BUSINESS_UNIT_ID,

           MATERIAL_ID,

           SUPPLIER_ID,

           UNIT_PRICE,

           QUANTITY_SOLD

    FROM SALES_F;

    END;

    ==============================================================================

    — Call procedure to generate records from existing records

    — ==============================================================================

    –Set to start number from previous procedure e.g. 1million

    CALL BUILD_FACT_TABLE_DUP(1000000);

    — double of previous value and so on

    CALL BUILD_FACT_TABLE_DUP(2000000);

    CALL BUILD_FACT_TABLE_DUP(4000000);

    CALL BUILD_FACT_TABLE_DUP(8000000);

    CALL BUILD_FACT_TABLE_DUP(16000000);

    CALL BUILD_FACT_TABLE_DUP(32000000);

    CALL BUILD_FACT_TABLE_DUP(64000000);

    CALL BUILD_FACT_TABLE_DUP(128000000);

    CALL BUILD_FACT_TABLE_DUP(256000000);

    CALL BUILD_FACT_TABLE_DUP(512000000);

    I hope this can be of help.

    (0) 

Leave a Reply