HANA SQL Processing and Generating Optimum Plan
Need for Query optimization:
Globally lot of customers using SAP HANA for quite some time now. On one side they are enjoying benefits of in-memory capabilities but on other side there is exponential growth in data. Also, with more and more user adoption for self-service reporting, there is high chance that some of the best practices are not followed. This generates need for optimization of SQL queries getting triggered on HANA. If you do not have SQL query in proper manner it may not work in optimal way with this exponentially growing data. Sometimes some reporting tools generating SQL query in background which is not visible to business users and they will not have idea about the exact query.
You might have come across situation where certain SQL query taking more time to execute than average time of other queries in your landscape. Such cases understanding SAP HANA Query Processor, SQL optimizer and execution order of SQL operators in data read process becomes more significant. Once we get it, we can think of altering default behaviour after analysing SQL processing to improve execution time.
We will discuss:
In this article I will focus more on these four topics. This will help you understand basics of query processing. HANA query Analysis tools and optimizations with SQL hints will be discussed through separate article subsequently.
- Where to find SQL triggered on HANA
- Commonly used SQL Operators and Execution Sequence
- SAP HANA Query Processor
- SQL Optimizer
Where to find running SQL query in HANA studio?
Before analysing any query, you should find out what is exact problematic query. You can find it HANA studio once you add your system at path:
Administration > Performance Tab > Sessions Tab > Filter ‘Connection Status’ on ‘Running’.
Normally you will have different background user id for each third-party application through column ‘Database User’ at same above path. Based on that you can find exact SQL. You can also get it from HANA tables logs after execution.
SQL Query Execution
Once you find the problematic SQL on any HANA environment, it will look like this. You might have multiple columns, filters or join conditions along with it, but basic structure will be same.
However, order in which HANA will process these operators will be different. Understanding what each operator will do and execution order will be first step to optimize the query. You might be reading or executing more data than you need in result. Refer below diagram which shows order of execution of same operators and what action performed with that operator.
Once you know this, you can achieve significant reductions in runtime by answering questions like
- Are we reading more data than what we need in Final result?
- Do we necessarily need join for columns in final output?
- Can we aggregate data before joining it?
- Can we add some filters as input controls?
There are certain SQL hints available to change this default behaviour and we will discuss that separately. Also Input parameters will help to a great extent for this as it will limit the base data.
SAP HANA Query Processor:
SAP HANA Query Processor consist of various engines. Different engines will be invoked depending on objects referenced in queries. This is high level architecture of SAP HANA Query Processor
Let us understand significance of each component once SQL is triggered on HANA from any application.
SQL Plan Cache: Check for cached entry of SQL, if found it will skip SQL Optimizer
SQL Front End: SQL statement is parsed and checked for syntax.
SQL Optimizer: The SQL Optimizer decides the best way to call the engines depending on the involved models and queries.
ROW Engine: Used for plain SQL and OLTP scenarios, few functionalities supported only with row engine.
JOIN / OLAP Engine: Join engine used for plain SQLs and OLAP engine used for aggregate operations.
HEX Engine: Combines functionalities of other engines, queries which are not supported by HEX engine will be processed by earlier engines.
(*There are SQL hints available to alter default processing engines)
The SQL Optimizer decides the best way to call the engines depending on the involved models and queries. Two main tasks of SQL Optimizer are ‘Rule-based’ and ‘Cost-based’ optimization.
Predefined rules are applied which would simplify the plan and lower the cost. Execution time generally reduced through simple change in algorithms or changing operators irrespective of how much data in source. Size Estimation and Cost comparison does not matter for optimization Examples:
- Calculation View Unfolding
- Simplify Aggregation (Group by) or Simplify Joins
- Push down blockers etc.
Based on this you can decide whether simplification (or removal) of join, or processing group by before join will help performance improvement or not.
Finds Candidates from rule-based optimization for further optimization. This involves finding optimum plan from multiple options by analyzing and comparing execution cost. Cost of every alternative is calculated in terms of number of rows. Examples:
- Logical enumeration by changing order of Operators
- Physical enumeration by changing Engine options and execution location.
- Column searches – pushdown blocker, singular vs stacked column search etc.
Based on this you can see whether running SQL through OLAP or ROW engine will have any performance improvement or not.
So far, we discussed, how the need to optimize SQL query arises with exponentially growing data and self-service reporting. How to find problematic SQL triggered from third-party application in HANA studio along with functions and sequence of execution various SQL operators. Some basic questions to answer, based on SQL operators used in query which will help improving execution time. We discussed SAP HANA Query Processor and main function of each of its components. Lastly, we discussed how SQL Optimizer will generate and compare multiple alternatives and some examples of changing optimum plan. These basic steps will help analyzing problematic SQL and help performance improvements.
This article contains general performance guidelines, based on personal experience. Same solution cannot be applied for all incidents. You will also need to consider overall landscape, hardware, data flow and various database parameters for optimum performance.
Contents in this article are referred from Open SAP and SAP Help portal.