Practical approach for response time prediction of HANA analytical applications
The practical approach for response time prediction of HANA analytical applications answers the question how the total response time of a HANA dialog step (or a query) would change on a hardware system with more or less physical CPU Cores.
It relies on configuration parameters of SAP HANA to control the number of physical cores that would be actually used for execution of a query or of series of queries, which belong to the end-user dialog step .
The scope of this approach is response time prediction of a single query run on HANA (no concurrent users) and the outcome of applying this approach is used as input for the calculation of total required Cores in case of concurrent users sending analytical queries to the HANA system in parallel.
INTRODUCTION
SAP HANA implements algorithms which automatically split and parallelize the execution of a HANA query. In this way on stronger multi-core machine the response time of same query is better compared to same query executed on machine with less number of cores.
In reality the queries do not scale unlimited – they have a part which cannot be parallelized, such as, for example, serializing the result set to be send to client, etc. With some queries/requests the sequential part is only a small portion of the execution flow and with others it is a bigger portion.
HANA CONFIGURATION FOR PARALLEL EXECUTION
In SAP HANA Studio navigate to Administration Console -> Configuration view.
The following 4 parameters are related to the parallelization in HANA:
- executor.ini -> limits -> max_pop_threads
- executor.ini -> limits -> max_server_pop_threads
- indexserver.ini -> parallel -> num_cores
- indexserver.ini -> parallel -> phys_num_cores
- global.ini -> execution -> max_concurrency
- global.ini -> execution -> max_concurrency_hint
If some property is missing on concrete HANA installation, it has to be added.
In newer HANA releases the trend is that parallelization is controlled only with the parameter max_concurrency_hint. Nevertheless for the time being it is still recommended to set all parameters because of the staged approach for migration to the new consolidated parameter.
RESPONSE TIME MEASUREMENTS
The measurements should be done first with the default parameter values, which usually represent the full capacity of the HANA assigned hardware and repeated with the special value-set for simulation of N/2 Cores, N/4 Cores, where N is the number of assigned cores to HANA.
To change the parameter values to just 1 Core, run script:
ALTER SYSTEM ALTER CONFIGURATION (‘indexserver.ini’, ‘SYSTEM’) SET (‘parallel’, ‘phys_num_cores’) = ‘1‘ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘indexserver.ini’, ‘SYSTEM’) SET (‘parallel’, ‘num_cores’) = ‘1‘ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘global.ini’, ‘SYSTEM’) SET (‘execution’, ‘max_concurrency’) = ‘1‘ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘global.ini’, ‘SYSTEM’) SET (‘execution’, ‘max_concurrency_hint’) = ‘1‘ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘executor.ini’, ‘SYSTEM’) SET (‘limits’, ‘max_pop_threads’) = ‘1‘ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘executor.ini’, ‘SYSTEM’) SET (‘limits’, ‘max_server_pop_threads’) = ‘1’ WITH RECONFIGURE;
To change the parameter values to 4 Cores, run script:
ALTER SYSTEM ALTER CONFIGURATION (‘indexserver.ini’, ‘SYSTEM’) SET (‘parallel’, ‘phys_num_cores’) = ‘4‘ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘indexserver.ini’, ‘SYSTEM’) SET (‘parallel’, ‘num_cores’) = ‘4‘ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘global.ini’, ‘SYSTEM’) SET (‘execution’, ‘max_concurrency’) = ‘4‘ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘global.ini’, ‘SYSTEM’) SET (‘execution’, ‘max_concurrency_hint’) = ‘4‘ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘executor.ini’, ‘SYSTEM’) SET (‘limits’, ‘max_pop_threads’) = ‘4‘ WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION (‘executor.ini’, ‘SYSTEM’) SET (‘limits’, ‘max_server_pop_threads’) = ‘1’ WITH RECONFIGURE;
Following the same logic, use the configuration parameters to simulate 8, 16… and so on number of Cores. Note that the number of “simulated” cores can be only smaller than the number of really physically available cores. If on machine with 40 physical Cores, the parameters are configured to simulate 160 Cores the results would not make any sense.
Note that with all conifguration the ‘phys_num_cores’, ‘num_cores’, ‘max_pop_threads’, ‘max_concurrency’ and ‘max_concurrency_hint’ are kept to the same value, e.g. the number of Cores which we like to simulate, while the ‘max_server_pop_threads’ keeps a contant at 1.
Recover the default configuration of the above parameters after the measurements are completed!
HOW TO INTERPRETE THE MEASUREMENTS
A prerequisite for interpretation of measurement result is that measurements are executed more than once and that the measurement result is reproducible. Measurements with more than 5% deviation are not acceptable.
Reproducible response time would be achieved only if no other queries are running in parallel and if the full CPU capacity of HANA hardware is available to the measured query at the time when it is executed.
For most of the HANA analytical queries, it is expected that the response time, measured with the parameter simulation of 1 CPU Core is longer than the response time, which is measured with parameter simulation of 2 CPU Cores and with default parameters values. The bigger the difference, the better is the scalability of the query.
Note !
Situation when response time, measured with parameter simulation for 1 CPU Core is shorter than response time measured with simulation of higher number of cores, is not expected and the reasonse for such wrong measurement should be clarified.
HOW TO PREDICT THE RESPONSE TIME
This would be explained with an example.
A dialog step (or HANA query) is measured with parameter simulation of 1, 2 and 4 CPU Cores simulation, with following results:
1 CPU Core: 125 [s] response time
2 CPU Cores: 80 [s] response time
4 CPU Cores: 57 [s] response time
Let P be the parallelizable part of the execution flow and S be the sequential part of the execution flow for the query, then according to Amdahl’s Low Response time [s] = S + P / N, where N is the number of “simulated” CPU Cores.
Applying the formula for N=1 gives 125 [s] = S + P/1
Applying the formula for N=2 gives 80 [s] = S + P/2
Out of the calculation from the 2 equations, the result is that S = ~35 seconds and P = ~90 seconds.
Using the 3rd measurements with simulation of 4 Cores for control:
S + P / 4 = 35 [s] + 90[s] / 4 = 35 [s] + 22.5 [s] = 57.5 [s]
The result from the formula , which is 57.5 [s] matches very well the measurement, which is 57 [s] (difference less than 5%).
Further the response time with different number of CPU Cores, e.g. with 16 Cores, 40 Cores and so on can be estimated.
This estimation makes clear where is the limit of number of CPU Cores which significantly improve the response time and make sense to invest into from hardware perspective.
In this concrete example, due to specifics of this concrete query more than 16 CPU Cores do not improve the response time significantly and the customers does not need to buy a bigger system, unless concurrent load is expected.
There are queries, for example for aggregates (max, sum, etc.) which scale excellent if appropriate HANA partitioning for high data volumes is used and thus can easily scale to and above 160 cores.
To continue with sizing for concurrent users of HANA Analytical applications, read http://scn.sap.com/community/performance-scalability/blog/2014/11/28/cpu-sizing-for-concurrent-users-running-hana-analytical-applications