Skip to Content

Background:

We wanted to review our current table distribution in the HANA sidecar and see if the resident reports/models/queries would be negatively/positively affected if we starting “grouping” similar tables on the same node. Much of the replicated tables from ECC are oltp transactional based tables and so even basic reports would need 5-10 tables to be joined or merged together. Before spending a lot of “where used” type investigation for the logical “grouping”, we wanted to do some basic analysis on how much overhead not having all the sources of a report reside on the same node is having.

Our “Test/Perf” Scale out Landscape

dbversion.PNG

Up to now, table distribution was handled by the default round robin assignment, there’s also an option through Studio to redistribute using a Wizard type function here. Please also Save the current distribution prior to making any changes.


Execute_Optimize_Distribution.PNG




At this stage it’s a good time for folks to look at this good reference document to be found here, but a bit outdated for Studio links etc. As I don’t want to go through all the same stuff here.

Table Distribution in Hana

I’m going to focus on just one schema for now, this is where all the ECC tables are replicated to, so let’s see how our current configuration matches up.


Landscape_Qry_Results1.PNG


From above, 02 looks to be carrying the bulk of the tables, so not sure how efficient the default distribution is working. In any case, please also note the 04 node is being kept exclusive for failover and isn’t allocated any tables currently.


SELECT a.host, a.schema_name, a.table_name, a.record_count, a.loaded, indexserver_actual_role as "Server Role", 'Column Store Tables' as Component1,
  round(ESTIMATED_MAX_MEMORY_SIZE_IN_TOTAL/(1024*1024*1024), 2) AS "ESTIMATED MEMORY SIZE(GB)",
  round(MEMORY_SIZE_IN_TOTAL/(1024*1024*1024), 2) AS "Currently used MEMORY SIZE(GB)",
  round(MEMORY_SIZE_IN_DELTA/(1024*1024*1024), 2) AS "Currently used Delta Store size(GB)"
FROM M_CS_TABLES a, public.m_landscape_host_configuration b
WHERE a.host = b.host
AND a.schema_name IN ('ECC_REPORTING')
AND ESTIMATED_MAX_MEMORY_SIZE_IN_TOTAL > 0
order by 8 desc


Going to focus specifically on 3 of the top 10,

Table_Qry_Results_new.PNG

An initial problem was that HF1 didn’t look anything the table distribution in Production, so let’s execute the following script on Prod, so we can sync HF1 to match.


select 'ALTER TABLE '||a.schema_name||'.'||a.table_name||' MOVE TO '||
     replace(a.host,'PROD_HOST_NAME','PERF_HOST_NAME')||':37003'
from M_CS_TABLES a, public.m_landscape_host_configuration b
WHERE a.host = b.host
AND a.schema_name IN ('ECC_REPORTING')


Syntax for re-distribution


  1. 1. ALTER TABLE table_name MOVE TO ‘<host:port>’


Note: Port is the port of the target index server and not the SQL port.

The actual alter table statement returns immediately, as during Redistribution the link to the table and not the physical table is moved. You’ll also note the table is fully removed from memory and will have the loaded = No. So on the next merge or access, it will take some additional time.

A simple Calc View for our analysis

calcView.PNG

JEST being our largest table has already been hash partitioned equally across the 3 nodes on HF1. So let’s work with what we have. The tables all basically reside on the 3 different nodes. Will I see any significant performance overhead due to this configuration?


PriorAlter1.PNG


Qry1:

Let’s just select equipment number for now, which should prune the right hand side of the calc view above and not hit the partitioned JEST table at all.


Select top 1000 equnr
from "_SYS_BIC"."abc.mfg.genealogy.models/CV_EQUIP_GEN_NEW_STATUS"
where erdat > '20140301'

Let’s see what the VizPlan has to offer, I do expect to see some Network Data Transfer entries based on current set-up.



qry1_viz_plan2.PNG

Note, there are a few Transfers noted above, however the overhead is relatively small compared to the overall run time.


Qry2:

While we have the table allocation as is, lets add in the partitioned table and see if we see more of these Network Data Transfer entries in the VizPlan. I’ll also expand the query with no where clause, but keeping the top 1000 stop clause. This is a Perf box after all and I want to see if having to move around much larger volumes of records will increase the time spend on the network data transfer.


Select top 1000 equnr, stat
from "_SYS_BIC"."abc.mfg.genealogy.models/CV_EQUIP_GEN_NEW_STATUS"

qry2_viz_plan2.PNG

From above, we can see there is an increase in the number of Network Data Transfer, however even with the larger volume, the overhead seems acceptable, again compared to the overall runtime. We also see the both paths of the calc view are now executed and the JEST partitions are hit in parallel.

qry2_viz_plan3.PNG

Qry3:

Let’s see if moving EQUI & EQUZ to the same node removes the network data transfer entries.


ALTER TABLE ECC_REPORTING.EQUZ MOVE TO 'HF1_HOSTNAMEX02:37003'    ;

Statement ‘ALTER TABLE ECC_REPORTING.EQUZ MOVE TO ‘hf1hana02:37003”

successfully executed in 259 ms 793 µs  (server processing time: 150 ms 915 µs) – Rows Affected: 0


AfterAlter1.PNG


Note EQUZ is no longer resident in memory, let’s do a LOAD so this won’t be an issue when we execute the query.



LOAD ECC_REPORTING.EQUZ ALL 
Started: 2014-05-09 15:19:49 
Statement 'LOAD ECC_REPORTING.EQUZ ALL' 
successfully executed in 1:05.285 minutes  (server processing time: 1:05.181 minutes) - Rows Affected: 0


Exec same qry1 again and let’s check the VizPlan,


Select top 1000 equnr 
from "_SYS_BIC"."abc.mfg.genealogy.models/CV_EQUIP_GEN_NEW_STATUS" 
where erdat > '20140301'

qr3_results.PNG

As expected, the Network Data Transfer steps seen in Qry1 VizPlan are now gone.

Summary:

So I hope this blog is a start for folks that are looking at table distribution, how to check current configuration and how to possibly identify potential issues etc. The results themselves are not very dramatic or resounding, but hopefully this may kick off ffurther discussions. There’s also room for a follow up specifically on Partitioning and how it can improve/reduce query efficiency. I only used 3 tables in the queries above, so the relatively small impact of having the tables distributing may increase with even larger tables or queries involving a larger number of tables.

So all comments or observations welcome 🙂

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply