Skip to Content

Today, when we talk about SAP HANA (High Performance Analytical Appliance ) – In Memory Computing Engine, the main question arises in each client/customer/developer’s mind is about Row storage and column storage. Since last couple of decades row store is very popular technique for data storage. Column storage though came into picture long back; there was no commercially popular database in the market. The main reason was processor speed and size. Now, with fast processing power we have 64-bit processor easily available in market. This blog is a small attempt to throw some light on this storage technique.

So let’s begin the tour…

Row Storage

Row storage is the conventional technique that is used in earlier databases to store the data in the tabular form. In row storage, data is inserted in form of tuple. Each tuple is nothing but a row which is unique identification of each record. There are many advantages of row store but the main or important advantage is easy to add or modify a record. So the systems where write back to database operation is required in that case row store is easier than column store.

Advantages:

  •          Row Store tables have a primary index
  •          Row ID maps to primary key
  •          Secondary indexes can be created
  •          Recommended when the tables contain less volume of data.
  •          Used when the application request has to access the entire row.

 Though HANA is a hybrid database which uses both row as well as column store techniques, I have given more emphasis on column store as the whole data modeling happens under row store technique.

Column storage

A column oriented database (also called as C-stores) stores the content column wise rather than by row. Though this technique is totally opposite to the conventional database, it is very useful in data compression, Storing data by column results in better compression than the conventional row store. It also performs well while query processing since queries read only the columns which are required.

Dictionary or bitmap encoding are some of the row store encoding techniques which are also used in column store.

Conventional databases i.e. row store are good for write operations therefore database with row store architecture is also called as write optimized system. This type of architecture is effective especially in OLTP systems.

On the other hand, system which are used for ad hoc querying with huge volume of data are read optimized, i.e. analytical or data warehouse systems where data is mostly used for reporting purpose. These systems represents read optimized in which bulk amount of data load is performed with ad hoc queries.  Other examples of read optimized systems are CRM (Customer Relationship Management), Electronic Library catalogues and other ad hoc inquiry systems.  In such environments, a column store architecture, in which the values for each single column are stored contiguously, should be more efficient.

–          Column store databases stores a dictionary of every distinct value that occurs in the column.

–          For each distinct value one needs to maintain a list that tells, for which row a specific value occurs

–          When we want to get a row/record from these values we need to go through each and every column and find the appropriate match. This is nothing but index inversing.

Advantages of column store:

  • Improves read functionality significantly, also improves write functionality
  • Highly compressed data
  • No real files, virtual files
  • Optimizer and Executer – Handles queries and execution plan
  • Delta data for fast write
  • Asynchronous delta merge
  • Consistent view Manager
  • Recommended when the tables contain huge volumes of data.
  •  Used when lot of aggregations need to be done on the tables.
  •  Supports parallel processing.

Reference:

http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/70d16119-ad21-2e10-de8b-eaaedf86b9cd?QuickLink=events&overridelayout=true

To report this post you need to login first.

5 Comments

You must be Logged on to comment or reply to a post.

  1. Lars Breddemann
    This is really not the kind of blog post I’d like to read about _any_ technology used with SAP.
    And especially not about such a new and fast moving one as HANA.

    Why isn’t there any personal insight or experience about the HANA storage technology in this blog?
    Why does this blog read as if you’ve never ever actually implemented a column store database to improve query execution runtime?
    Why does the list of ‘advantages’ of the column store read as if you don’t understand that e.g. also the row store has an optimizer, an executor and a virtual file management (all features that basically all databases nowadays fancy).

    Sorry, but no – this blog doesn’t help anyone to move on with HANA or in-memory computing.
    This is merely rumination of the information that are already out for many months now.
    Please, keep on blogging but come back with something original, important or funny.

    Lars

    (0) 
    1. Marilyn Pratt
      I’m afraid I need to agree with you entirely Lars about this content and although I don’t like to discourage newbies, this should never have passed moderation.  Yet, going forward we are going to need to get used to seeing the critique as post-publication moderation.  I hope the author will understand that we want to see content that reflects experience, opinion, and educates the reader.  Lars words are not meant to censor but rather instruct.
      (0) 
  2. Witalij Rudnicki
    … I expected it to be more or less repeat of already discussed blog BIA – Vertical Decomposition.

    But after reading the content, I must say that it is alarming. Instead of clarifying, this blog is actually giving lots of misinformation. I agree that new bloggers should be encouraged and not everyone automatically becomes the Dick Hirsch in blogging, but the quality of information should be the priority.

    (0) 
  3. Sankaranarayanan Hariharan

    Hi all,

    I would say blog was useful stuff especially for beginners in the field of computers itself. Infact fresh graduates like me will be benefited with this level of information. However it seems we got an expert level analysts in this forum and hence may consider this as inappropriate. Instead of discouraging new bloggers who write on basics, I request the moderator/admin if to rate all the blog as Basic/Intermediate/Advanced/Expert before publishing so that the above experienced guys will not be disappointed and newbies like me will have some wasy basic level content to read and search in Google.

    Thanks,

    Sankar

    (0) 

Leave a Reply