Additional Blogs by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member
0 Kudos

With the introduction of BIA, the biggest change came to the way BI was perceived. The combination of unprecedented fast data warehouse and advanced integration tools opened many doors and BI is no longer being looked at as a standalone data warehouse, rather as an integral part of the entire business process. Jens Doerpmund has discussed this phenomenon articulately in his weblog "BIA BLOG Series, Part I: BIA Changes Everything!".

The combination of many innovative techniques used in BIA makes it feasible to achieve the unparalleled response time. Few of those advanced techniques are “in memory processing”, “horizontal partitioning”, “Scalable Multi-server Architecture” and “vertical decomposition”. The contribution of many of these techniques in getting fast response time is evident by their name itself, but the concept of “vertical decomposition” and its role is a little less obvious and may need some closer look. 

BIA uses column store architecture to store data. In this database model, data is organized as columns of values from the same attribute unlike storing it as rows of tabular records in traditional database models. The biggest advantage of this architecture is that during query execution, only the values of columns required for processing that specific query is required to be read and brought into memory and the other irrelevant attributes can be avoided. As can be seen in Fig 1 below, the customer sales data has been stored as rows in traditional database model. Fig 2 shows the same data stored in BIA using column store architecture.

Fig 1 – Row structure in traditional database

             Fig 1 - Row structure in traditional database

Fig 2 – Column structure in BIA

            Fig 2 – Column structure in BIA

In a typical BI environment, usually the Infocubes contains a large amount of attributes (Characteristics/Key Figures) and analytical queries are designed to access only a subset of them. The vertically decomposed data model in BIA has sizeable performance advantage as only the required attributes from the cube are selected and transferred to memory as opposed to the conventional database model where all the data in the table is transferred together (entire row). In our example, if a query is required to display Customer and associated sales figures, traditional database will return all attributes and then the required ones (Customer and Sales) needs to be selected. But in case of BIA, only Customer and Sales values will be returned. As a result, redundant process of handling irrelevant data (from query perspective) is avoided and hence faster response. Also, storing data column wise make the process of aggregation faster.

Also, reduction in number of attributes transferred means considerably less load on I/O. This factor is really significant as most of the times, interface is the bottleneck rather than the processing speed in achieving better query performance. If we look at the statistics of all queries at any BI environment, there will be only few queries which are designed optimally to return manageable records. There will be a big percentage of queries which will return considerable amount of records and spend a lot of time on I/O. The I/O speed has always been and still is much slower as compared to processing speed. Hence it is critical to keep an eye on the amount of data being transferred and try to minimize it. Vertically decomposed data helps in achieving this.

Using column store architecture has another added advantage of efficient data compression. Compression in BIA is particularly effective, as values within a column tend to be quite similar to each other and compress very well. In a traditional row-oriented database, values within a row of a table are not likely to be very similar, and hence are unlikely to compress well. BIA compresses data using smart dictionary based algorithm using integer coding and it works really well with similar data. This compression technique reduces the data volume significantly.

7 Comments