Projection Filter push down in Calculation View
*** Please note that below document is based on the personal experience of using different HANA Information modeling constructs. The results of Performance tests depend upon the data volume and the Query filters. It is advisable to make the final judgement based on your own testing of such models.
Many of the reporting scenarios implement data comparison business case like “This Year to Last Year comparison” or similar. Such business requirement implementation in HANA can be done using the Calculation View model which is discussed in many of the presentations / training sessions. Something like shown below.
Figure 1: Sample model with Projections for the Year on Year data comparison
The base model consists of creating projections over the Analytic view with the filter value for “This Year” Projection and “Last Year” Projection and a UNION node to combine the data for reporting.
Let’s consider a sample data model based on Sales data. An Analytic view is created on the Sales base table and used in a Calculation view with Projections for This Year and Last Year data as shown in the sample model in Figure 1.
Figure 2: Sample data models based on Sales data for This Year and Last Year data comparison
In most of the cases, the general requirement is that, the user should provide “This Year” value (say Month and Year) and system should then return back data for the Month in This Year and for the Month in Last Year. Such data input can be captured and modeled in multiple ways.
This document outlines the impact on Performance of two such implementations of Filters in Projection. It explains the FILTER PUSH DOWN impact due to the different implementation options.
- Using Input Parameters for This Year Month and for Last Year Month
- Using Input Parameter for This Year Month and Calculated Column for Last Year Month
What is Filter Push Down:
When the data is queried on HANA Information models based on a subset defined by certain values, then the subset is defined using Filters. If the filters are implemented at the lowest level of dataset generation (Analytic search), then all the subsequent activities like aggregation happen on this reduced dataset. This provides high performance benefit.
If the filters are not implemented at the Analytic Search, then the entire data set is passed to the aggregation depending upon column joins and hence has negative impact on the performance.
To check if the filters are pushed down to the Analytic search, you need to find the “BWPopSearch” operation and check the details on the node in the visual plan. Please refer to the awesome blog by Lars Breddemann explaining the Visualize Plan tool.
Impact of using option 1: Using Input Parameters for This Year Month and for Last Year Month
In this implementation, two Input parameters are used
- For This Year Month – IP_TY_MONTH
- For Last Year Month – IP_LY_MONTH
The model can be created as shown below:
Figure 3: This Year and Last Year data comparison model using Input Parameters
The sample Sales data consists of few records for the Sales value in different months in different countries.
When such models are queried, then the Projection Filters are pushed down and the dataset exchanged between the OLAP engine and the Calc engine is quite less as the data is filtered in the OLAP engine and only the subset of the entire data required for the aggregation is passed to the Calc Engine. The execution plan for such query shows “Search” being performed as the first operation, reducing the dataset to be passed on for further processing.
Query:
SELECT “C_COUNTRY”, “C_YEAR”, “C_YEARMONTH”, “ZPERIOD”, sum(“C_SALES”) AS “C_SALES”
FROM “_SYS_BIC”.”sample/ZGCV_SALES_TY_LY”
( ‘PLACEHOLDER’ = (‘$$IP_LY_MONTH$$’, ‘201201’)
, ‘PLACEHOLDER’ = (‘$$IP_TY_MONTH$$’, ‘201301’))
WHERE “C_COUNTRY” = ‘US’
GROUP BY “C_COUNTRY”, “C_YEAR”, “C_YEARMONTH”, “ZPERIOD”;
Sample data in the underlying Sales table:
Figure 4: Sales data in the underlying table
Output of the query:
Figure 5: Query output based on This Year month and Last Year month Input Parameters and Filter
Execution Plan:
Figure 6: Execution plan for the Query with Projections based on Input Parameters
As seen in the Figure 6 above, the execution plan shows that the filters defined in the projection are pushed down to the OLAP engine resulting in a smaller data set which is passed to the Calc engine. This filter push down improves performance significantly in the Query execution. Hence to improve the performance, it is advisable to model the views so that the data is filtered much earlier.
Impact of using option 2: Using Input Parameters for This Year Month and Calculated Column for Last Year Month
This option provides the flexibility of the coding to derive the other filter values based on a given Input Parameter. It is quite natural to expect the users to provide value for ONLY ONE input parameter for a Projection filter and derive the other value based on the user provided value. Such coding can be achieved using Calculated columns.
*** Please note that currently the expression builder for Projection filter do not provide higher flexibility in data manipulation. Thus such manipulations need to be performed using Calculated Column. The expression builder for Calculated Column support wide variety of functions, enabling complex data manipulation.
In due course of time, I am very sure that the expression builder for Projection filter will support more functions.
The model for the Option 2 can be created as shown below:
Figure 7: This Year and Last Year data comparison model using Input Parameter and Calculated Column
When such models are queried, then the Projection Filter defined in the Calculated Column is NOT pushed down and the dataset exchanged between the OLAP engine and the Calc engine is quite large depending upon the column joins of the columns used in the Query. The entire data is passed to the Calc Engine where the subsequent filtering and aggregation happens. The execution plan for such query shows that NO Search operation happens in the OLAP engine. This has negative impact on the performance as large amount of data is exchanged between the engines and high memory and resources are required for the query execution.
Query:
SELECT “C_COUNTRY”, “C_YEAR”, “C_YEARMONTH”, “ZPERIOD”, sum(“C_SALES”) AS “C_SALES”
FROM “_SYS_BIC”.”sample/ZGCV_SALES_TY_LY”
( ‘PLACEHOLDER’ = (‘$$IP_LY_MONTH$$’, ‘201201’))
WHERE “C_COUNTRY” = ‘US’
GROUP BY “C_COUNTRY”, “C_YEAR”, “C_YEARMONTH”, “ZPERIOD”;
Sample data in the underlying Sales table and the output of the query remains same as for Option 1.
Execution Plan:
Figure 8: Execution plan for the Query with Projections based on Calculated Column and Input Parameter
As seen in the Figure 8 above, the execution plan shows that the filter defined in the projection using the Calculated Column is NOT pushed down to the OLAP engine resulting in large data set passed to the Calc engine. It is strongly recommended to look into the Query execution plan for the Information models and check the impact of such modeling constructs. It is advisable to ensure that the filters can be pushed down to the OLAP engine to improve the performance of the query. If possible, you may opt to change the existing model with Calculated Column to Input Parameters and have the data manipulation done for the values to be passed to the Input Parameters in the Front end reporting tool rather than in HANA. The User input Prompts in the front end reporting tool can be defined to capture one value and other value to be passed to the second Input Parameter can be derived from the first prompt value.
As mentioned at the start of this document, I would request fellow HANA practitioners to try and test their models and validate the performance impact of the queries.
Also as mentioned above, I am very positive that the filter expression for Projection will provide large set of data manipulation functions in future, which will enable writing better Filter conditions and will eliminate the need to define the Calculated Columns for filtering in the Information model completely.
Good explanation with scenario - Thanks
Thanks for the Informative blog.
Hi Ravindra!
Plain and simple: that's the kind of content I'd love to see much more often here in SCN!
Well done and clearly far above the average.
- Lars
Excellent one. Would like to have more documents/blogs from you.
Thanks a lot Rama, Santosh, Lars, Raj 🙂
A word of praise from Lars, who taught me Performance improvement in HANA, is very much encouraging.
Regards,
Ravi
Hi Ravi,
Guud One..... I am going to try this approach in my current work...Thnkss 4 sharing..:)
Regards,
Kulwinder
Great job Ravindra!
Keep on!
Greetings,
Blag.
Thanks Kulwinder, Blag
.
Regards,
Ravi
Hi Ravindra,
thanks for the tip.
Is there also a way to push a filter via parameters into the attribute view?
Reason for aksing:
I don't know however how to push a parameter down there
appreciate your advice,
thanks
elmar
Hi Elmar,
Currently to my knowledge, the Input parameters cannot be defined at the attribute view level. The static filters defined on the attribute view will help you reduce the data volume, but I am not sure if that is applicable in your requirement.
When you applied the hard coded filter, the result set from the attribute view might have been reduced providing better performance in the join.
I have also noticed one issue in the join execution based on size of the attribute table w.r.t. fact table.
The visual plan of the query will help you understand where the performance is getting degraded and may provide additional information on how the joins could be optimized.
You can also raise this performance issue with SAP through Support note and may receive additional information from SAP support. I do that quite often.
Regards,
Ravi
Hi Ravi,
it is so good! Thank you very much.
Best Regards,
Hai
Excellent read Ravindra !!!
Quick question. In your scenario, is it possible to have multiple input parameters?
For example there are 2 input parameters; date and country
So when you do data preview 2 input parameters show. Let's say date is mandatory.
So the user has 2 options:
1)Put in date and country
2) Put date and leave country blank (to get all the countries)
Is this possible graphically?
Hi Zain,
In this case, the country need not be an input parameter. It can be a variable as the value for the country is passed in WHERE condition to the entire data set. The Date is required to calculate two different data sets for TY and LY. Such modeling is possible only with Input Parameters.
Technically you can even create Input parameter for Country and add it to the projection filter expression and need to check if the input value is blank or null, the do not use it for comparison or any similar logic.
But I would suggest to use a variable for country.
Regards,
Ravi
Hi Ravindra,
Thanks for the great post. One question I have is for the Option #1, how do you calculate the LY input parameter based on the user entered TY and pass to the view.
Hi Ravindra,
Thanks.
I am facing 1 issue following your scenario when i am trying to use the filter expressions(similar to above) and go for data preview i am getting following error everytime.
"Not able instantiate the Calculation view".
Please suggest.
Thanks
Hi Ravindra, Thank you so much. Awesome job!