Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

Summary

  • HANA provides table partitioning options that are logically similar to other  RDBMSs
  • Partitioning is important to spread tables across multi node HANA implementations - facilitating increased query parallelization
  • Effective implementation of of partitioning is important in maintaining a consistent Delta Merge overhead
  • Partitions can be managed explicitly - i.e. add/remove ranges ; split merge
  • Simple partitioning syntax

Introduction - First Thoughts

SAP produce a brief guide to this -  http://help.sap.com/hana/hana_db_part_en.pdfInterested to find out how/why you might deploy explicit partitioning in HANA.Now I understand that HANA will dynamically partition tables to  a query - i.e. to use as many cores as possible at 2Gb/sec per core.   Tables need to be partitioned to  spread them across mutile hosts.   So why else might you partition?SAP list the 'usual' partitioning  benefits -


  • load balancing
  • parellelization (? - I thought HANA was going to do this for me ...)
  • partition pruning (i.e. - can go straight to a sub set of partitions)
  • Explicit Partition Handling i.e. managing time series data

All well and good.   OK - a few HANA only considerations :-


  • A non partitioned table is limited to 2 billion rows
  • delta merge processing time - if modifications  are restricted to a sub set of tables then there will be less delta merge (or spread merge across hosts)* to complete
  • Only available on column store tables (HANA does row and column store)
  • Typically used in distributed landscapes - but can be beneficial on single host systems **

* Delta Merge? - a key part of the design of HANA and a topic in itself.  To summarise in one sentence  - data changes are stored in a separate area, with its own data dictionary; this area is periodically merged with the main store (Shall we say HANA is fast - but a dictionary encoded column store is going to have to compromise somewhere ...)

** OK.  Is that where 'we' get to improve parallelization - by splitting tables across nodes? I.e. HANA can partition tables 'on the fly' WITHIN A NODE ; to use more than one node for a query we need to distribute data across nodes.  That makes sense.2 Billion row limit? I'm guessing thats a facet of a dictionary encoded column store which has restricted itself to integer for 'rowids'? This initially sounds a bit restrictive.  However  I think when you get to  billion row plus tables you'd be considering partitioning on most RDBMS systems. And of course with HANA we need to consider Delta Merge processing.  Also  big tables are going to reside in big systems - which are likely to have more than one node -and you'd probably want to split that big table across those nodes.  So this row limit is not looking like an issue. With billion row plus tables you'd probably have gone for partitioning for other reasons - not because of the 2 billion row limit.

Types of Partitioning

HANA provides two categories of  table partitions ; Single Level and Multi-Level.Lets look at Single Level first.This looks pretty familiar (I'm talking from a Sybase ASE/SQL Server background) :-

  • Hash
  • Round Robin
  • Range

Syntax is simple.  You can specify  the number of partitions or - and this seems to be SAPs recommendation - use the GET_NUM_SERVERS() function.A restriction in Single Level table partitions is that the columns that are used to partition the table must be part of of the primary key (except hash partitions - which cannot have a primary key defined)

Multi-Level

Slightly harder to visualise than Single Level partitioning, Multi-Level partitioning :-

  • Allows partitioning on non key values (a restriction in hash and range)
  • Works well with time series data - if older data is not updated (or rarely updated) then there will be little or no need to complete Delta Merge processing on these partitions. This facilitates keeping the time (and resources) required to complete Delta Merge processing (for a table) constant

Three types of Multi-Level partition :-

  • Hash-Range. 'Most typically used'.  Example given shows time series date split evenly across nodes - i.e. if have 4 nodes and month based data each node will have 25% of January, 25% of February etc.
  • Round robin-Range
  • Hash-Hash

I can see that Hash-Range allows for maximum parallelization - i.e. highly likely to engage all nodes.  Certainly Delta Merge will be  completed on all nodes. So if data modifications are restricted to the current month (or quarter etc.) the Delta Merge should be consistent AND will utilize all available nodes - so it's completed as fast as possible

Partition Handling

  • Any partition involving a range function can be managed by adding or removing ranges
  • Partition type of a table can be changed
  • Spit partitions
  • Merge partitions
1 Comment
Labels in this area