The Art and Science of HANA Performance Modeling
During the course of last 2 years, I’ve conducted SAP HANA performance workshops and reviewed information models at various customers. I encounter similar modeling issues or mistakes almost at all customer projects. The good news is that most of the performance problems can be resolved by making with minor changes to the model.
However it is critical to understand the reasons behind the performance problems consistently observed in the HANA implementations. Here are my observations.
- HANA modelers have pretty good idea on how to build attribute views, analytic views, and calculation views which are part of HANA modeling. However, there is significant gap in the knowledge of using these HANA information views effectively to address the business requirements.
- Some of the HANA modelers do not have data warehousing background and hence lack the knowledge of basic multi-dimensional analytical reporting, which leads to poor solution design.
- Modelers and developers seems to have the knowledge of “what works and what does not work”, but they seem to lack the knowledge “Why it is behaving the way it is behaving”, hence the struggle to apply the knowledge effectively while building the models.
- Finally there is a fundamental flaw in the understanding of Software Development Life Cycle (SDLC). Is performance testing a phase before Go-Live or is it part of unit testing? HANA’s query response is extremely fast for the smaller data set regardless of the modeling approach. This gives a false positive of efficient modeling. The realization of performance issues become visible only in Staging/QA system where models are tested against large data volumes.
As I mentioned earlier most of these problems can be resolved with very little changes. Performance improvements at various customers are
Beverage Major (Before 5mins, after 3sec)
Pension Plan Management Company (Before 30sec, after 3 sec)
Grocery Retailer (Before: 19sec, after: 300ms)
Audit Firm (Before: 19min, after 8secs)
So it is essential to build the model right the first time. So let’s learn to model the right way the very first time.
Join the session to discover the arts and science of tuning HANA models for performance.
Webinar details HDE Webinars | HDE
Nov 12, 2015 / 8:00 – 9:00 AM Pacific, 11:00 – 12:00 PM Eastern (USA), 5:00 PM CET (Germany)
Link to join the “HDE webinar” series: https://sap.na.pgiconnect.com/i800545
Participant Passcode: 110 891 4496
US and Canada: 1-866-312-7353 tel:+18663127353,,,1108914496#
Germany: 0800 588 9331 tel:08005889331,,,1108914496#
UK: 0800 368 0635 tel:08003680635,,,1108914496#
India: 1800 266 0327 tel:18002660327,,,,1108914496#
China: 800 820 8010 tel:8008208010,,,1108914496#
China: 400 613 8112 tel:4006138112,,,1108914496#
See here for dial-in numbers for all the countries.
Add to your calendar/ Meeting request: SAP HANA iFG CALL 124 HDE Webinar – The Art and Science of Tuning HANA Models f.ics
Recording: SAP HANA iFG Call 124 – Sci of Tuning HANA Models for Perf
Meeting request Link: SAP HANA iFG CALL 124 HDE Webinar - The Art and Science of Tuning HANA Models f.ics
Thanks Reddy. I've updated the blog post to include the link
The session will be recorded and will be posted others
Jenny Ly: Hello everyone, thanks for joining us today. Please note, the call back feature as been disabled for security reasons. Please dial-in from a phone to listen to the audio
Jenny Ly: If you have any questions for Abani, please feel free to post them here in the chat box
(g) Cameron (Intel): What is the impact if it is a table function as a source instead of a scripted calculation view?
Abani: Starting from SP09, use of Table function is recommended instead of scripted calculation view.
The original script-based calculation was envisioned to run in the contents of Calculation Engine (i.e. to use CE function). However that paradigm has changed, now lot of innovation/optimization is coming part of the SQL engine optimization. In fact SQL engine optimization is now enabled for all calculation views starting SP09 (even though the view is not set to execute in SQL engine).
So the future direction is no more scripted calculation view and only graphical calculation views. The scripted logic is pushed to the DB layer as Table function.
For now – there is NO impact of using table function vs. scripted calculation view. Though depending on your models, you may see a different execution plan.
Amit Kumar: Amit(TCS)-Does the Subtree Cost have a Major Performance Issue
Abani: The sub-tree cost indicates the estimated cost of the operation of each step. If there are two queries accessing the same table and producing the same result, the query with lower sub-tree cost is better.
Actually depending on the SQL query and the underlying model, HANA optimizer may create multiple execution plan and will execute the most efficient execution plan with lower sub-tree cost.
(g) Vaithi Ramadoss: Can you pls exapln Subtree cost column in the explain pna?
Abani: Please see above
(g) Vivek: ANy recommendation on when to use which explain tool?
Abani: For all SQL queries.
If you are querying HANA information models – You can execute the explain plan to see if the query is being executed only in COLUMN ENGINE or not (hint: sign of ROW ENGINE is not a good). You can also check the list of tables and the host where the table reside. This will give an idea of network data transfer.
If you are righting SQL statements – part of scripted calculation view, store procedures or table functions is very useful. Since it can provide insight into the estimated cost of each operation (JOIN, filter etc.) part of the SQL statements.
From SP10 onwards, Explain Plan will also give insight of operations part of the Information Models.
(g) Ravi: can you pls walkthrogh an example for explain plan in the system?
Abani: Please refer to the HANA Academy video on Explain Plan. Please refer to the links in the presentation for more details.
(g) Ravi: sorry Visualize Plan not explain plan
Abani: Please refer to the links in the presentation for more details.
(g) Krishna(Intel): How can we force Parallel Execution?
Abani: If your model is done correctly, HANA will automatically execute in parallel. You can’t force a step to execute in parallel (by a setting or so).
As an example, multiple nodes feeding an UNION step will execute in parallel. In case of scripted calculation view (or table function), avoiding IF THEN ELSE logic will make the statements execute in parallel.
(g) Dmitry: One more question pls. What about temporal join in Calculation View (performance degradation)
Abani: There is absolutely no performance degradation with temporal join. Temporal Join in Analytic View is one of the best implementation for modeling slowly changing dimensions.
(g) Ashish: Visualize plan link which you are referring seems to be very old do we have any new article explaining the latest visualize plan in detail
Abani: The concept of visualization plan has not changed. New features has been added to the tool. But the articles are still relevant. I’ve included few other links as well.
(g) Terence O'Donovan: Why would you use left-outer joins for performance?
Abani: Left-outer joins are efficient for Join-pruning. A left-outer between two-table will execute IF and ONLY IF there is a field requested from the right table.
Since Joins are expensive operation (esp. joins between table with high cardinality), pruning a join during execution will certainly result in better performance.
Also starting SP09, consider using the new Join Property (Optimize Join = True) for Left outer join
(g) Vikas Bittera: what should we do if we need to implement Full outer join?
Abani: Full-Outer join is not supported in graphical calculation view. In certain scenarios (sources with similar dimensions), you can model full outer join as UNION with Constant Values.
You can consider using SQL-Script in Table function to implement Full-outer join.
The option is push this to the Table i.e. Transformation.
(g) Dmitry: Set optimize join = concat_attribute and need more memory?
Abani: Optimizing join columns is supported only for left outer join or text join (with cardinality 1:1 or N:1) and right outer join (with cardinality 1:1 or 1:N).
Standard behavior - while executing the join, by default, the query retrieves join columns from the database even if you do not specify it in the query. This means that, you are including those join columns into the SQL group by clause without you selecting them in the query.
You can avoid this behavior of including the join columns into the SQL group by clause using the join property, Optimizing Join Columns. When this property for a join is set to True, the columns which are not specified in the query are not retrieved from the database.
(g) Naresh: How to force if then else type of scenario in graphical view when we need to join the table dynamically basedon business condition
Abani: If then else scenario can be modeled with a Constant Mapping in a UNION node. (Refer Slide 22: Model Pruning)
SOURCE = CY (in WHERE clause) will only execute the CY node
SOURCE = PY (in WHERE clause) will only execute the PY node and
SOURCE IN (‘CY’, ‘PY’) or no SOURCE (in WHERE clause) will execute both CY and PY node.
(g) Ashish: Is there clear guidelines when to use the execute in SQL eninge setting? if it good to set always then why the setting was not made default?
Abani: For information models built using only calculation views (like HANA live) – using the property “Execute in SQL engine” may give better performance in most (not all) of the use case. So it is advisable to test it for your use case and use appropriate setting.
It also depends on the HANA revision. There have been progressive innovation in SQL optimizer since HANA SP06 (when this property was introduced).
(g) MuhammadDanish: as you said HANA Live views should be execute in SQL-engine, is it true for facts and dynamics?
Abani: See above. It is recommended to use this setting for the top level view – which is used in the SQL query. You can also test query performance with setting this property for all views (dimension and fact) and use whatever works best for your scenario.
(g) Vikas Bittera: any suggestion how to implement Full Outer Joins?
Abani: Answered in the post above
(g) nikhil: @Vikas : Add Calculated column in both the tables, set the default value as 1 and join on this column
Abani: Be careful with this approach, since this will create a Cartesian-Product.
(g) Cameron (Intel): no - that's a Cartesian product. To do a Full Outer Join you Left Join and RIght join the 2 tables and then UNION the results
Abani: Agreed. Thanks Cameron
(g) Jody Hesch: Other option - if result is aggregated, and you're trying to Full Outer Join two sources with same dimensions - then you can do "Union With Constant Values"
Abani: Agreed, thanks Jody
(g) Vikas Bittera: two LOJ & Union gives very poor performance
(g) Vikas Bittera: Union with constant values does not work as we have different characteristics in the two data sets
(g) Vikas Bittera: for example in one source you have Planned Date and in other you have Actual Date.. this data does not come on same row with Union
(g) Jody Hesch: Gets a bit tricky then 🙂
(g) Jody Hesch: I'd suggest investigating ETL options as well. Full Outer Join is possible scripted - but will likely perform badly
Abani: See my response above
(g) Terence O'Donovan: Seconding quaestion from "Camerion (Intel)" What is the impact of table function vs. scripted view?
Abani: See my response above
(g) Vikas Bittera: ETL is not an option, if u have to do this on SLT Tables for real time data 🙁
Abani: Script i.e. Table function may be way to go – consider using appropriate filter
(g) Farid: HANA Live has no Attribute Views, so are we encouraged to use attribute views or avoid (it is outdated)?
Abani: The future direction is to use Calculation Views. However, you must be at least in HANA SP09 to get comparable performance (Analytic Views vs. Star-Join calculation Views).
However as I mentioned in the presentation. The concept of Attribute View and Analytic View is still relevant
However, attribute views, analytic views are still supported.
(g) Vaithi Ramadoss: Teched SAP clearly mentioned that Attribute and analytical viewsa are gone
Abani: Attribute views, analytic views are still supported
(g) Vikas Bittera: we want to use Calculation views only.. but many features of Analytic views are still not available in Calc views.. for example Temporal joins
Abani: Agreed. Temporal Joins are only available in Analytic View.
However if you have only couple of time-dependent master data, You can implement temporal join in calculation views (M:N joins with filter FROM_DATE <= FACT_DATE and TO_DATE >= FACT_DATE). Please note, in this case, these join to the Dimension view will always execute because of the filter condition. (So consider the performance implication).
(g) Cameron (Intel): what does "consider pushing most of the joins to the lowest level" even mean when the original example had about 20 joins in it?
Abani: Ideally all these 20-joins should be modeled part of star-join (Analytic View or star-join calculation view). That’s the right way.
Now if you must join in stackable joins (like staircase). Consider joining the high-cardinality dataset as early as possible (i.e. push it to the lowest level).
(g) Cameron (Intel): the Temporal Join performance is horrible - its actually 10x more efficient to do it in a scruipted view...but then you have the issues of the scripted view if its part of a complext transformation
Abani: We’ve seen much better performance with Temporal Join than scripted calculation view. I’d be interested to see your plan visualization.
(g) MC: @Vaithi - if Tech SAP said analytic and attribute are gone, then what is the preferred modeling technique.?
Abani: Answered above.
(g) Vaithi Ramadoss: use only Calc views
(g) Vikas Bittera: Scripted view doesnt support query pruning.. so larger impact on slicing dicing of queries
(g) Jody Hesch: pushing joins down - should be done via star schema (analytic view or star join in calc views)
Abani: Agreed. Thanks Jody.
(g) Jody Hesch: temporal join - we've seen better performance than scripted
(g) Jody Hesch: whether to leave AN views - depnds 🙂
(g) Jody Hesch: depends* i.e. on what SP you're on, what you're trying to accomplish
(g) MC: what SP is recommended to leave analytic and attribute?
(g) Vaithi Ramadoss: 10 and above
Abani: Consider using Calculation Views.
(g) Farid: so all the modeling is done in calc view, but we should still use filters in the projection nodes where the table is being consumed OR it doesn't matter where the filters are used because HANA is smart enough to enforce it?
Abani: HANA is smart enough to push the filter down, if models are designed properly. It is recommended to set the filter early in the projection nodes where table is being consumed.
(g) MC: we thought projection filters are done after the join itself between 2 projections that are filtered in the expression?
(g) MC: meaning join happens, and then filters done from teh projection?
Abani: Assuming the joins are not on calculated column and the filters will be pushed down to the respective projections. So join will occur on the filtered dataset.
(g) Ashish: when we have requirement to use standard cals such as Meidan, Stnd Deviatoin etc is it better to use standard function using script view or create our own using graphical cal views in terms of performance
Abani: Use graphical calculation views. Avoid scripted views (where possible).
(g) Ajay: There is Referential join in Calculation view? SHould we use ANalytical views only to make use of referential joins?
(g) Ajay: There is NO Referential join in Calculation view? SHould we use ANalytical views only to make use of referential joins?
Abani: Currently Referential Join is only supported in Analytic View. Referential join is an optimized inner join (optimized implementation for join pruning). So If Referential Integrity is maintained in your dataset, Left outer join will give the same result as that of Referntial Join.
So you can consider using left outer join in calculation view. Otherwise consider using Inner Join.
(g) Saravanan N: Some one asked the full outer join and how to use.....Check this link
(g) Saravanan N: http://saphanatutorial.com/sap-hana-join-types/
(g) Vikas Bittera: i basically meant that FOJ is not available in Graphical calc views
(g) Saravanan N: Yes it is not available in Graphical calc views
Abani: Yes. Discussed above.
(g) Cencosud: Friend, much of the recommendations are related to Analytics. Are they good for Suite on HANA too?
Abani: This is applicable for Suite on HANA too esp. in the context of reporting.
(g) Vikas Bittera: when u say that replicate master data to all nodes.. do you mean to spread master data in all nodes or all master data in all nodes? we have master data from SLT in one SLT schema along with transactional data... we do not parition master data tables across nodes as they r quite small
Abani: Master data tables should not be partitioned (assuming you have less than 2B records) and hence there is no reason to distribute one master data table across multiple nodes.
The following SQL statement will replicate the KNA1 table to all nodes. It essentially keeps a copy of the master KNA1 table in all nodes and data in all tables are in sync.
alter table "ERPECC "."KNA1" add replica at all locations;
(g) Vaithi Ramadoss: BW has limitation of 13 dimenstions.. can we break that in HANA? will that be consider as bad design?
Abani: In the context of Star-Join (Analytic View or Calculation View) in HANA, there is NO limitation. Having more than 13 dimension is NOT a bad design.
(g) Ajay: How can we mimic referential join functionality in a calculation view?
Abani: Referential join is an optimized inner join (optimized implementation for join pruning). So If Referential Integrity is maintained in your dataset, Left outer join will give the same result as that of Referntial Join.
So in a calculation view, you can consider using left outer join or inner join (if referential integrity is not maintained in your data) in calculation view.
(g) Vikas Bittera: will the BW on HANA generated models on BW objects will also become all Calculation views only?
(g) Vikas Bittera: as of now BW Objects auto generated models have Attribute view & Analytic views
Abani: May be in future releases.
(g) Ashish: @Ajay referential joins i believe its available in SP10 onwards
(g) Ajay: ok i still dont see it
(g) Ajay: should i upgrade my studio?
(g) Ajay: I am already on SP 10
Abani: Not yet in HANA SP10.
(g) Todd Greenwood: are decision tables costly - per slide 34?
Abani: Not in general. Decision Tables are being implemented as Stored Procedure. So
in the content of the use case at the customer, decision table was fitting in.
(g) Vikas Bittera: Thanks a lot!!
(g) 110 891 4496: was a great presentation! Thanks a lot!
(g) Vivek (Intel): Thanks
(g) Naresh: Thank you. Pl. share the link
(g) Kevin G: thanks
(g) Ranjeet: Great presentation!
(g) Ashish Patel: thanks
(g) Hari Krishna: thankyou
(g) Terence O'Donovan: Thanks!
(g) Cencosud: Thanks a lot
(g) Jody Hesch: Thanks Abani!
Abani: Thanks Jody for answering some questions during the presentation. Buddy – we missed you at the HDE dinner at TechEd.
(g) Todd Greenwood: Muchas Gracias
(g) Dmitry: Some words about hints pls
(g) pooja: Thank You!
(g) Michael: merci
Abani: Thank You. Thanks for all the questions. Unfortunately, I could not answer all of them during the presentation. So I’m taking this opportunity to answer all the questions asked during the Presentation.
Jenny Ly: Everyone, the slides, recording and Q&A will be posted on the HANA iFG Community -- https://jam4.sapjam.com/groups/Njsx5wN5rQyqKZU0wnxfXK/overview_page/101911
Recording: SAP HANA iFG Call 124 - Sci of Tuning HANA Models for Perf
Very neatly explained. Thanks for sharing it... 🙂
Good presentation Abani. Performance is key criteria that every customer is looking at. Thanks for sharing.
Great article and great blog. The link to the recording is not working. Is there any way i can get those recordings?