Data Modeling in SAP HANA using calculation view is one of the supreme capabilities provided for end user to mold his raw data in to a well structured result set by leveraging multiple operational capabilities exposed by calculation view. On the way of achieving this we need to think on different aspects involved.
Let me take few lines here to quote some of the real world examples to provide better clarity on my talk.
We all know that there are two major parameters which we generally take in to consideration when we are qualifying or defining the standard of any automobile, which are nothing but ‘Horse Power(HP)’ and ‘Mileage’ of the automobile. There is always a trade off between the two, by which i mean that a higher HP automobile yields reduced mileage and vice versa. Why does this happen? It is because we are making the underlying mechanical engines to generate more HP and thus consuming most of the source energy(fuel) for this purpose.
Let us now get back from the mechanical world to our HANA world and start thinking the underlying execution of HANA database analogous to the above quoted mechanism.
When our calculation view starts computing complex calculations on big data its a matter of fact that the we will have a trade off between the performance and volume of data.
In this kind of Big Data scenarios where in we are expecting a better Horse Power from the underlying engine, let us even think how can we make the mileage/Performance also to be better in the current document.
One of the new features of HANA SPS11 called ‘Union Node Pruning in calculation view ‘supports us to achieve this by reducing the cost of execution of calculation view by Pruning union operation dynamically based on the query by end user.
Let us understand this by an example : Consider that we are creating sales report for a product across the years using a calculation view. The view consists of two data sources, which are current sales data(for YEAR >= 2015) and archived sales data(YEAR <= 2014) and both of which are provided as input to union node of calculation view as shown below :
Now think of a scenario where in we are querying the calculation view to get the result of current_sales, wouldn’t it be great if the underlying execution engine queries only on the current_sales based data source and prune the operation on the archived data source.
Yes, this can now be achieved in the case of union operation in a calculation view by providing pruning definition in a predefined database table which is called as Pruning configuration table.
Definition of the pruning configuration table should be of the below format :
and an example content for the pruning configuration table is as below :
CALC_SCENARIO comprises of the calculation view that involves union node pruning and INPUT column takes the data source names involved in the union node of calculation view.
Now in the advanced view properties of calculation view mention this pruning table as shown below:
Now activate the above view which involves 2 data sources PRUN1 and PRUN2 with pruning configuration table.
And execute the query on that view involving a filter condition that is equal to the condition mentioned in the pruning configuration table :
sum(“SALES”) AS “SALES”
WHERE ((“YEAR” > (‘2005’)))
GROUP BY “ID”,
Visualize the plan for the above query and we see that the union node is pruned as the filter condition matches the one in pruning configuration table, which is as shown below :
Now remove the pruning configuration table from view properties of the calculation view, activate it and execute the above query again and perform the plan viz of the same. We now see the union node coming in to picture, thus the query invoking both archived data and current data in spite of requirement just being the current sales data.
Thus Union node pruning in CV now helps to decide how the execution flow must be carried based on the query dynamically
Hope the provided information is useful. Any suggestion and feedback for improvement will be much appreciated.