By Christina Adams
Development, SAP Customer Activity Repository
In this article I will provide some suggestions on potential areas to consider when trying to improve the performance of a SAP HANA calculation view.
Layer only for warmth, not style
The layering of views should generally have no performance impact in and of itself, but this is not entirely correct in all situations. It is possible to have negative impacts to performance when useless layering has been implemented (for example, when layers exist where there is no additional filtering or reduction of attributes and measures, etc). Conversely, it is possible to have positive impacts to performance by adding a layer having aggregation.
Avoiding multiple layers of views (where no additional complex logic is really needed, or could be done somewhere else) can improve performance, similar to collapsing 6+ view levels into 2-3 view levels.
From the very lowest levels of a view to the highest, only project the data that is really needed (whether internally within the view or external to the view). Views should always be defined to meet a specific business purpose, and should preferably not be ‘open-ended’ or use a ‘one size fits all’ approach. For example, you should avoid creating a single view designed to project all of the data that a user could potentially want, but without any clear idea if all of that data will ever actually be required.
A single view defined with too many attributes and measures can be difficult (and sometimes impossible) to make more performant. Instead, create different views for different use-cases and only expose the minimal data required to support those use-cases, even if the underlying data may be the same.
- if only 5 columns out of the 50 columns available in a table/view are actually needed, then project only those 5 (at the lowest level possible);
- if a column is only needed for filtering at the lowest level, then project it for the filter, but then make sure that it is not projected in any subsequent levels in the view;
- if a use-case does not require many of the attributes/measures that an existing view is providing, then consider making a different view instead which only provides the fields necessary for that specific use case.
Location, location, location
A view having calculated columns at the lower levels can be slower than a view having the same equivalent calculated columns at the highest level. The use of a calculated column should be delayed until the last possible level (like in the aggregation level of the view, rather than in a projection at the bottom).
For example, if a calculation such as “SalesAmount – TaxAmount” is done at the lowest level (for each table record entry) then it will take significantly longer to process than if we push the SalesAmount and TaxAmount columns up to the highest level in the view (within the aggregation node) as is and then create the calculated column there.
Generally, it can sometimes be beneficial to try to minimize the number of calculated columns, and to minimize the references of one calculated column to another.
For example, try to avoid calculated columns like the following, if by collapsing them to a single PromotedSales column it makes the view faster:
HasPromotion: if(“PromotionID <> ‘’, 1, 0)
SalesAmount: “SalesAmount” – “TaxAmount”
PromotedSales: if(“HasPromotion”, “SalesAmount”, 0)
Doing something early isn’t always better
Datatype conversions in a view (such as decfloat()) will impact performance, although sometimes they cannot be avoided. It is usually best to leave any conversions as late as possible (i.e. at the highest view level that makes sense). The database aggregates data extremely well, but explicit conversions are best done after data has been aggregated to some degree.
Use the right tool for the job
The performance of restricted columns can be potentially better than implementing the same logic using IF statements in calculated columns.
A rose by any other name…
If possible, try to avoid situations where an IF statement is needed for calculated columns when it could have possibly been considered as a lower-level filter condition instead.
For example, if a view could support an overall assumption that it would only ever report on data of type ‘A’ or ‘B’, then these type values could have been used as filter criteria at the lowest level, rather than by using these type values in IF statements for the calculation of certain KPIs at the higher levels.
Avoid all unnecessary joins whenever possible. Even an unused left outer join (where no column from the ‘right-side’ of the join is requested) has the potential to impact performance, so if the data that this join is providing is not absolutely a must-have requirement, then try to eliminate it.
For example, if the use-case can support it, then do not provide text descriptions for columns if they are not really necessary.
Leave the best for last
Joins for obtaining descriptive texts should be left until the very last view (i.e. the query view) if at all possible, since the rows involved in the joins will be the most aggregated (hopefully).
Make sure that the cardinality is correctly set for all joins (especially left outer joins) since this can have a huge performance impact (as the database always uses this information when it builds the execution plan).
Planning for the unexpected
Consider how the attributes that are referenced in joins will be impacted by a user’s query during performance testing. Specifically, it is important to consider how a filter could potentially be pushed down during the execution of the view logic.
For example, given a view that would normally be used to retrieve information for a specific plant, which has a left outer join on the Plant attribute within it, a filter on Plant provided in the query (i.e. the specific Plant values) would be pushed down accordingly to the ‘right-side’ of the left outer join. However, if no filter is provided (i.e. if no Plant value has been specified) then there would also not be any filtering pushed down to the ‘right-side’ of the left outer join. This could result in a noticeable difference for the performance if the data being retrieved on the ‘right-side’ of the left outer join is extremely large and/or involves complex calculations, since the database would have to execute the ‘right-side’ in its entirety before trying to join to the ‘left-side’. If the ‘left-side’ was relatively small in comparison, then all of the processing that was done on the ‘right-side’ would have been for nothing.
Sometimes the use of Union instead of Join can improve a view’s performance, but this may not always be a possible option if the underlying data and desired results do not lend themselves to being modeled in this way.
View Parameters & Filters
Pass the puck
The best practices dictate that input parameters should not be defined for reuse views when the sole reason is for performance (in order to push a filter down to the lowest level), but sometimes this may be necessary. It is very important for proper performance that any input parameters are correctly ‘mapped’ to any underlying views having similar input parameters.
Everything but the kitchen sink
Always ensure that only the bare minimum of data from the underlying tables and/or views is being used, and do this by defining as many relevant filter conditions as possible (preferably using ‘equals’ comparison) at the lowest levels.
For example, if the underlying data is based on sales transactions, then consider what specific types and sub-types of these sales transactions should be included or excluded (such as potentially excluding voided sales transactions, or sales transactions that were done only for training purposes, etc.) in order to limit the amount of data being retrieved, wherever possible.
Drivers, start your engines
Always evaluate whether enabling the ‘Enforce SQL Execution’ property setting for a view will positively impact its performance.
Telling it like it is
Ensure that the correct view type (e.g. Cube, Attribute, etc.) is defined for a view and that the Analytical Privilege flag is only enabled for query views that will be consumed externally using named database users (which also requires that the applicable analytical privilege metadata has been defined), since incorrectly setting these view properties can impact performance.
As different as they are the same
Only a single engine should be used within a given SQLScript, so CE functions should not be mixed with SQL (only one or the other, but not both at the same time).
SQLScript should have reduced complexity whenever possible so that the compiler can better optimize the query execution, and common sub-expressions should be avoided because it is very complicated for query optimizers to detect common sub-expressions in SQL queries (if a complex query is broken up into logical sub queries then it can help the optimizer to identify common sub-expressions and to derive more efficient execution plans).
For more on these types of practices, see http://help.sap.com/hana_platform -> References -> SAP HANA SQLScript Reference.
Six of one, half-dozen of the other
Script-based calculation views may sometimes perform better than graphical calculation views while implementing the equivalent logic. The implementation of a view using the script-based option instead of the graphical option should be considered as a potential alternative approach in certain circumstances.
The sum of its parts
Identify as early as possible any areas where aggregation could be beneficial in reducing the amount of data involved in joins and calculations (e.g. additional aggregation nodes can be added within a view to accomplish this, if required). Avoid treating all values as attributes without considering whether or not the values may be better served by being defined as measures instead, so that they can be aggregated when relevant.
Expecting the moon and the stars
When possible, avoid complex determination logic on large data volumes (such as when it is necessary that some kind of ‘determination’ or complex logic is needed involving multiple rows for a given decision on what a given value should be) within any view that must support near real-time response. The expectation should be that the database will in fact execute this logic very quickly (when compared to a traditional RDBMS’) but that the response may not always be considered fast enough when a user sits and waits.
Know where the target is before throwing the dart
Always ensure that there are well defined use-cases for the expected consumption of a view (preferably with sample SQL statements that can be used for simulation and testing), and that they accurately reflect the most common real-world usage, since the evaluation of the acceptable performance must be based on this.
Final Thoughts and Takeaways
Based on my experience improving performance on SAP HANA views, consider my last three tips:
- Start performance testing as early as possible. You will need as much extra time as possible to evaluate different approaches should the view not be as performant as expected.
- Performance tuning requires a lot of trial and error. Expect to try something out to see what impact it may have on performance (such as removing or adding certain internal projections/aggregations in a view, etc.).
- Always ensure that testing is done on multiple systems before deciding on a final design. A view queried on a development system having very little data will behave very differently than a view queried on a performance test system and/or productive system having large amounts of data and multiple concurrent users). Think about it!