Some time ago I cam across an interesting term when reading about the BI Accelerator and what makes it so fast. The term was “ Column Based Compression “
I have basically started at the root of all things -something I felt was required to start off this topic, something that is known to all but then serves as a nice introcustion to set the stage.
On delving further into the same – came across a whole lot of published material on the same which seemed very interesting and questioned the basics of how queries need to be executed in an analytical perspective.
This basically led to questions such as :
- What sets apart Analytyic Applications from traditional OLAP systems.
- What is the importance of Performance / DB in an analytical application
- What is the difference between OLAp and OLTP.
What is the difference between a query on an OLTP system versus an OLAp system.
Queries are more record based –
Example – what is the invoice value for Invoice number XXXXX for customer ABC….
This is a very granular query and in most cases the query patterns are fixed – meaning it is known about user patterns in an OLTP scenario.
In other words :
The focus is on automating daily business tasks and queries on an OLTP database tend to be closely associated with a specific process / screen actions – For example – VA02 – where the sales order is analyzed. To provide for this kind of access , the query structure has these parameters coded in advance with all optimization possible by way of indices and necessary variables filled in at run-time.
OLAP – Analytic Applications:
Example – What is the sales done for year 2008 to customer ABC as compared to the sales done in Year 2007….
This query has the following characteristic:
It cannot be predicted as to what the user can query against and it would defeat the purpose of a data warehouse / analytical applications if we were to do so. Here the analysis can be done basis any dimension , any combination – more specifically Analytical applications lend themselves more towards random queries and it is difficult / improper to predict user patterns in the same.
The analysis is always done on a much higher granularity than the granularity at which OLTP databases are consumed. And also the same data set tends to be compared with history data and not essentially present data. OLTP databases are more of finding the current version of data rather than doing an analysis.
More Read-Oriented Than Write-Oriented.
With huge data volumes and historical data Analysis is meant to be for a read oriented system and the tables are meant more for reading data than inserting new records into the same. Typically data is written to the data warehouse in batches (for example, data collected during the day can be sent to the data warehouse from the enterprise transactional databases and batch-written over-night), followed by many read-only queries. Real time data is also supported but then the utility is more operational than analytical.
Of High Granularity
Analytical queries are not meant to query at the lowest level – this would mean that an analytical query is not meant to query at an invoice level but then is expected to query at a customer / state / country level which is of much higher granularity.
Further, they tend to focus on only a few such attributes at a time – typical queries will not have get all the attributes of a customer – it is more user specific and what the user needs at that point in time.
Will post more about row stores and column stores in the next blog.