Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

Hi All,

I’m sorry but this isn’t going to be an exciting blog about some of HANA’s cool features.

It's a bit boring but the Bread & Butter of all databases is having tables with data, that can be queried and joined.

For those of you just starting or those that are more adventurous and want a large volume of data to test out performance tuning options,  then  TPC Benchmark™H (TPC-Hmight be for you.

http://www.tpc.org/tpch/

TPC-H provides 8 tables and a data generation tool that enables you to create as much data as you want.

Generate the test data, create the tables, import and you’re away.

The 8 tables are all designed to be joined. They simulate Sales Order transactions:

The tables include (using SAP style terminology):

  1.        Customers (Customer Master)
  2.        Supplier (Vendor Master)
  3.        Part (Material Master records)
  4.        Partsupp (Inventory Balances)
  5.        Nation (Country)
  6.        Region
  7.        Orders (Sales Order Header)
  8.        Lineitem (Sales Order line items)

NOTE: The tables are NOT in the same format as the similar SAP ECC tables but offer similar information as what you might find in ECC.

The Table Schema is as follows:

NOTE: SF is the Scaling factor used to generate the data. 

            For example a scaling factor of 1 will generate 6 Million records for the lineitem table, SF 2 will generate 12 Million, 10 will generate 60M,

           100 will generate 600M, etc.

As a quick summary you will need to :

  •      Log into Linux with the appropriate authorization
  •      Install a C/C++ compiler if you’ve not already
  •      Download/Extract the TPC-H files
  •      Modify the ‘makefile’  to be relevant for Linux
  •      Compile the data generation tool (DBGEN)
  •      Run DBGEN to create data files based on the scaling factor
  •      Create the tables and import them in HANA studio

If you’re still reading then I’ve not scared you off, so now for the technical bit.

Log into Linux with the appropriate authorization

Use Putty or equivalent tool to log into Linux box running HANA , at the OS level.

For those using a AWS HANA developer box refer to http://scn.sap.com/docs/DOC-28294

Install a C/C++ compiler if you’ve not already

sudo zypper install gcc gcc-c++ gcc-fortran

cd /usr/bin/

sudo ln -s gcc g

Download/Extract the TPC-H files

Create a new directory to store TPC-H files, ensure there is enough space on the mounted folder then download and extract the files.

On AWS I have choosen the following:

cd /sap

mkdir tpc-h

cd tpc-h

wget http://www.tpc.org/tpch/spec/tpch_2_15.0.zip

unzip tpch_2_15.0.zip

Modify the ‘makefile’  to be relevant for Linux

Use 'vi' or equivalent to modify the supplied 'makefile'  which is used to compile the data generator tool.

In the 'makefile' ensure the following parameters are set

DATABASE=SQLSERVER

MACHINE = LINUX

WORKLOAD = TPCH

NOTE: We use SQLSERVER above as it doesn't recognise HANA.

For an intro to  'vi' you could try  http://heather.cs.ucdavis.edu/~matloff/UnixAndC/Editors/ViIntro.html

At the command prompt:

cd tpch_2_15.0/dbgen

vi makefile.suite

Edit the file and save.

Compile the data generation tool (DBGEN)

Now we need to compile the generation tool, to generate the executable file 'dbgen'

make -f makefile.suite

The dbgen executable should now appear in the folder.

Check with.

ls -l

Run DBGEN to create data files based on the scaling factor

Now you are ready to create 'tbl'  files for each of the 8 tables.

The important parameter of the 'dbgen'  program is the scaling factor.

A scaling factor of 1, will generate 1Gb of uncompressed test data(e.g. 6 million rows in the 'lineitem table')

A scaling factor of 2, will generate 2Gb of uncompressed test data(e.g. 12 million rows in the 'lineitem table')

A scaling factor of 10, will generate 60Gb of uncompressed test data(e.g. 60 million rows in the 'lineitem table')

A scaling factor of 100, will generate 600Gb of uncompressed test data(e.g. 600 million rows in the 'lineitem table')

etc.

Excute the tool with one of following commands withe different scaling factors (or your own):

#Scaling factor 1

./dbgen -vf -s 1

#Scaling factor 2

./dbgen -vf -s 2

#Scaling factor 10

./dbgen -vf -s 10

[Note: I suggest your try scaling factor 1 first, till you are comfortable with the load process]

Create the tables and import them in HANA studio

create schema "TPCH";

set schema "TPCH";

drop table nation;

drop table region;

drop table supplier;

drop table part;

drop table partsupp;

drop table customer;

drop table orders;

drop table lineitem;

create  column table nation (

n_nationkey  decimal(3,0) not null,

n_name       char(25) not null,

n_regionkey  decimal(2,0) not null,

n_comment    varchar(152)

);

create column table region (

r_regionkey  decimal(2,0) not null,

r_name       char(25) not null,

r_comment    varchar(152)

);

create column table part (

p_partkey     decimal(10,0) not null,

p_name        varchar(55) not null,

p_mfgr        char(25) not null,

p_brand       char(10) not null,

p_type        varchar(25) not null,

p_size        decimal(2,0) not null,

p_container   char(10) not null,

p_retailprice decimal(6,2) not null,

p_comment     varchar(23) not null

);

create column table supplier (

s_suppkey     decimal(8,0) not null,

s_name        char(25) not null,

s_address     varchar(40) not null,

s_nationkey   decimal(3,0) not null,

s_phone       char(15) not null,

s_acctbal     decimal(7,2) not null,

s_comment     varchar(101) not null

);

create column  table partsupp (

ps_partkey     decimal(10,0) not null,

ps_suppkey     decimal(8,0) not null,

ps_availqty    decimal(5,0) not null,

ps_supplycost  decimal(6,2) not null,

ps_comment     varchar(199) not null

);

create column table customer (

c_custkey     decimal(9,0) not null,

c_name        varchar(25) not null,

c_address     varchar(40) not null,

c_nationkey   decimal(3,0) not null,

c_phone       char(15) not null,

c_acctbal     decimal(7,2) not null,

c_mktsegment  char(10) not null,

c_comment     varchar(117) not null

);

create column table orders  (

o_orderkey       decimal(12,0) not null,

o_custkey        decimal(9,0) not null,

o_orderstatus    char(1) not null,

o_totalprice     decimal(8,2) not null,

o_orderdate      date not null,

o_orderpriority  char(15) not null,

o_clerk          char(15) not null,

o_shippriority   decimal(1,0) not null,

o_comment        varchar(79) not null

);

create column table lineitem (

l_orderkey    decimal(12,0) not null,

l_partkey     decimal(10,0) not null,

l_suppkey     decimal(8,0) not null,

l_linenumber  decimal(1,0) not null,

l_quantity    decimal(2,0) not null,

l_extendedprice  decimal(8,2) not null,

l_discount    decimal(3,2) not null,

l_tax         decimal(3,2) not null,

l_returnflag  char(1) not null,

l_linestatus  char(1) not null,

l_shipdate    date not null,

l_commitdate  date not null,

l_receiptdate date not null,

l_shipinstruct char(25) not null,

l_shipmode     char(10) not null,

l_comment      varchar(44) not null

);

IMPORT FROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/lineitem.tbl' INTO "TPCH"."LINEITEM" WITH RECORD DELIMITED BY '\n' FIELD DELIMITED BY '|';

IMPORT FROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/customer.tbl' INTO "TPCH"."CUSTOMER" WITH RECORD DELIMITED BY '\n' FIELD DELIMITED BY '|';

IMPORT FROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/nation.tbl' INTO "TPCH"."NATION" WITH RECORD DELIMITED BY '\n' FIELD DELIMITED BY '|';

IMPORT FROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/orders.tbl' INTO "TPCH"."ORDERS" WITH RECORD DELIMITED BY '\n' FIELD DELIMITED BY '|';

IMPORT FROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/part.tbl' INTO "TPCH"."PART" WITH RECORD DELIMITED BY '\n' FIELD DELIMITED BY '|';

IMPORT FROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/partsupp.tbl' INTO "TPCH"."PARTSUPP" WITH RECORD DELIMITED BY '\n' FIELD DELIMITED BY '|';

IMPORT FROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/region.tbl' INTO "TPCH"."REGION" WITH RECORD DELIMITED BY '\n' FIELD DELIMITED BY '|';

IMPORT FROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/supplier.tbl' INTO "TPCH"."SUPPLIER" WITH RECORD DELIMITED BY '\n' FIELD DELIMITED BY '|';

Congratulations!

You are now ready to start running queries. 

Section 2.4  of the following document  http://www.tpc.org/tpch/spec/tpch2.15.0.pdf has plenty of query examples for you to try.

You may need to translate these to HANA SQL or be more adventurous and create them with Attribute, Analytic & Calculation views to optimise performance.

NOTE: For those that successfully translate the queries to HANA SQL then please consider adding your translated SQL as a comment to this blog, to share with others.

Have fun bench-marking your results with others.

------------------------------------------------------------------------------------------

BTW:  I used TPC-H data in my first blog comparing HANA to HADOOP IMPALA.

http://scn.sap.com/community/developer-center/hana/blog/2013/05/30/big-data-analytics-hana-vs-hadoop...

Labels in this area