Just putting up and writing down all the consolidated best practices for HANA modelling that we witnessed for better performance.
- Try using Star-join instead of joining multiple nodes of master data/dimensions to a fact table of measures.
- Try to use Calculation view for all the modelling and design that involves complex calculations and data massaging.
For master data/attribute view types use – Dimension
For mulit-dimensional purpose/OLAP scenarios – CUBE
- Even attribute views/analytic view with calculated columns & measures/logic will be passed to calculation engine of HANA view.
- If the design involves many joins of master data-dimensions with measures (fact tables columns) use Star-join for the type. This has better performance rather than making joins at multiple nodes
- Setting the Cache invalidation period for views/models that doesn’t have frequent changes on underlying data will give you quicker results and hold the cached result for an hour or day
Suggestions on Joins
- If possible try to utilize the same node with calculated columns for different conditions and project with union nodes to combine several conditions/segregated projections of same node. (Avoid projecting same node/information to different segments)
If joins are required try to use Left outer/right outer with cardinality (n:1 or 1:1) – if you are sure about the driving tables, as this would enhance the performance and option of selecting “Optimize join = TRUE.
Also, consider multiple join nodes throughout the design to analytic view and then join those analytic views in the calculation view. Avoid single data source (or node) feeding multiple nodes in calculation view.
- Also, minimize the number of nodes and joins as much as possible and try to use union instead.
- For better query performance and quicker retrieval of results try to avoid filtering/joining on calculated columns and do not mix up scripted views and graphical calculation views in a single view.
- Alternately CE_CALC functions or script can be considered for joining scripted views or several graphical calculation views.
Always follow – HANA Development Points
- Use left outer joins maximum.
- Specify cardinality in joins (n:1 or 1:1) – only if sure
- Set optimize join = true (only if cardinality is set like above)
- Use table functions instead of scripted calculation views.
- Execute in SQL-engine (for HANA live)
- Reduce the data set as early as possible. Use design time filters at the lowest level
- Input Parameter: Placeholders part of the models Can be used in Calculation. Can accept multiple values and can be derived from table or store procedure
- Ensure Variables (where clause) is pushed to the lowest level. Confirm using Visualization Plan
- Use analytical privilege and SQL analytical privilege (SP10) to filter data
- Avoid script-based calculation view, WHERE clause will not be pushed down.
- Using Filter is better than using Inner Join to limit the dataset.
- Avoid filters on calculated column (consider materializing these columns).
- One of the best practice in HANA modeling is to define joins on columns with either INTEGER or BIGINT as data types.
It is not recommended to have joins on calculated columns/fields with NVARCHAR or DECIMAL as data types, might create performance issues.
These are some of the best practices followed by many of the consultants and developers. But still some complex business requirements forces us to use or alleviate from such best practices, which can be ignored.