HANA Modelling – Consolidated Best Practices for better Performance
Hi All,
Just putting up and writing down all the consolidated best practices for HANA modelling that we witnessed for better performance.
- Try using Star-join instead of joining multiple nodes of master data/dimensions to a fact table of measures.
- Try to use Calculation view for all the modelling and design that involves complex calculations and data massaging.
For master data/attribute view types use – Dimension
For mulit-dimensional purpose/OLAP scenarios – CUBE
- Even attribute views/analytic view with calculated columns & measures/logic will be passed to calculation engine of HANA view.
- If the design involves many joins of master data-dimensions with measures (fact tables columns) use Star-join for the type. This has better performance rather than making joins at multiple nodes
- Setting the Cache invalidation period for views/models that doesn’t have frequent changes on underlying data will give you quicker results and hold the cached result for an hour or day
Suggestions on Joins
- If possible try to utilize the same node with calculated columns for different conditions and project with union nodes to combine several conditions/segregated projections of same node. (Avoid projecting same node/information to different segments)
If joins are required try to use Left outer/right outer with cardinality (n:1 or 1:1) – if you are sure about the driving tables, as this would enhance the performance and option of selecting “Optimize join = TRUE.
Also, consider multiple join nodes throughout the design to analytic view and then join those analytic views in the calculation view. Avoid single data source (or node) feeding multiple nodes in calculation view.
- Also, minimize the number of nodes and joins as much as possible and try to use union instead.
- For better query performance and quicker retrieval of results try to avoid filtering/joining on calculated columns and do not mix up scripted views and graphical calculation views in a single view.
- Alternately CE_CALC functions or script can be considered for joining scripted views or several graphical calculation views.
Always follow – HANA Development Points
- Use left outer joins maximum.
- Specify cardinality in joins (n:1 or 1:1) – only if sure
- Set optimize join = true (only if cardinality is set like above)
- Use table functions instead of scripted calculation views.
- Execute in SQL-engine (for HANA live)
- Reduce the data set as early as possible. Use design time filters at the lowest level
- Input Parameter: Placeholders part of the models Can be used in Calculation. Can accept multiple values and can be derived from table or store procedure
- Ensure Variables (where clause) is pushed to the lowest level. Confirm using Visualization Plan
- Use analytical privilege and SQL analytical privilege (SP10) to filter data
- Avoid script-based calculation view, WHERE clause will not be pushed down.
- Using Filter is better than using Inner Join to limit the dataset.
- Avoid filters on calculated column (consider materializing these columns).
- One of the best practice in HANA modeling is to define joins on columns with either INTEGER or BIGINT as data types.
It is not recommended to have joins on calculated columns/fields with NVARCHAR or DECIMAL as data types, might create performance issues.
These are some of the best practices followed by many of the consultants and developers. But still some complex business requirements forces us to use or alleviate from such best practices, which can be ignored.
Hello Prasanna,
Nice blog. Can you please mention the HANA version on which these best practices were tested? Also I have another question. Could you please explain the point "Avoid single data source (or node) feeding multiple nodes in calculation view." I dont see how reusing a node affects performance.
Regards,
Safiyu
Hi Saidumuhamed,
Using single node or table in multiple instances meaning - Lets say if you have a fact table/an aggregation node that has some quantifiable information and you're just joining it to multiple nodes. (Lets say filter on each year/region of sales and forecast). and joining it to various master data nodes and combining it in finally.
Instead we can use star-join or use that node as the central fact node and join multiple master data. This is kind of rare scenario/bad modelling that I've witnessed in one of the client.
Hi Prasana
In the above case i.e analytic view, I have understood what you meant. Have you only meant this statement 'Avoid projecting same node/information to different segments to be true in case of an analytic view only?
I mean it again differs to case to case, but mostly for Analytic views or calculation views with comparing same factual data/transnational data.
But better to avoid for calculation views as well, since it will stab the performance overhead.
Hi Prasana,
thanks for this nice blog-post.
Do you have any experiences or best practice on creating calculated columns (e.g. column A + column B, creating Flags using IF- and CASE-Statements) in an calculation view?
- On the lowest level?
- On the Aggregation-Node?
Best regards,
Fabian
Hi Fabian,
Yes we did.
Basically for such if/case statements and decision scenarios - calculated columns can be added in both scenarios .
1) On the lowest level - if the calculated column is a measure like count/sales data for which discount/logic needs to be applied. (So that it will not cause/give you a unexpected value).
2) On the aggregation node - if your calculated column is a flag/dimension information that needs to be calculated on the sum of measures. (eg: set flag = 'HIGH' if avg.sales > 20,000)
So basically it depends on the use case. The most we used is on the lowest level.
Warm Regards,
Prasana Ravichandran
Hi Prasana,
thanks for your answer. This is very helpful.
According to this I have another question on the layer architecture of your views.
Do you use Staging-, Core-, DataMart- and Reporting-Layer to have a clear structure for your HANA-views?
E. g. Combine Header and LineItems in Staging-Layer (maybe filter some data) to build a dataset. Afterwards refer to this Staging-View in Core-Layer and add fields by join, calculate columns, etc.
Or is it better to build a dataset only in one view and avoid using different layers?
Warm regards,
Fabian
That's a good question.
So if your landscape is a Side car approach on HANA and you're not using any HANA Live views(pre-built SAP Content delivered) , you can go by the staging, core and data mart or reporting layer as you mentioned.
Since most of our landscape was based on HANA Live, we modelled/enhanced the master data HANA views in a separate package/segmentation and factual HANA views - like sales and billing documents separately and combined in the reporting views.
You will get better performance when you combine several master data views with the fact data in a star-join node. Or you can just copy/enhance the standard HANA live content that is already available as part of HANA live installation.
As we always follow HANA dev't. points still, Very good points with clear explanation. 🙂
Thanks Narayan
Hai Prasana,
I have one doubt. suppose we filter data in middle of nodes by using input parameter.the end user given value when popup occurs. my doubt is,It filter the data from bottom node with the end user given input parameter or from middle nodes onwards it do filtering.
Not clear about your question . But the filter pushdown in general always happens at the projection/aggregation node where you filter the attribute
suppose in cv one projection node is in middle.i applied filter there with input parameter. my doubt is it filter data from that particular node onwards or from bottom nodes of cv.
Ofcourse from that particular node where u applied filter on IP.
It wont do automatic filtering until there's a filter based on IP is detected.
Dear Prasana!
Great wiki-style blog!
Could you share information source for this: "Avoid single data source (or node) feeding multiple nodes in calculation view"?
best regards,
Sergey
Hi ,
Eg: Lets say i have a projection node - for table LIKP. I'm using the same projection node in 4 different join/aggregation nodes like below then it should be better avoided and to be revisited !
Hi Prasana!
It was really helpful!
Many thanks!
Cheers from Brazil!
Obrigado Amigo