CPU Sizing for concurrent users, running HANA Analytical Applications
As prerequisite for the concurrent users load CPU sizing, the HANA analytical application procedure for response time prediction should be applied according to the description in http://scn.sap.com/community/performance-scalability/blog/2014/11/28/practical-approach-for-response-time-prediction-of-hana-analytical-applications .
The term “query” which is used below should not be understood concretely as single HANA SQL query. This can be any functionality, consisting of multiple queries, which are executed as result of user navigation in the application UI.
With the help of this procedure, the characteristics of the query are clarified:
- the Sequential (S[s]) and Parallel (P[s]) part are determined in seconds.
- The response time graphic as function of the number of used cores should be created and based on it the customer should decide on which response time is acceptable in his concrete case. Often the fastest response time is not really required and if target response time is determined optimal, a lot of hardware expenses could be saved. The selected acceptable response time is further referred in this article as target response time (TargetRT[s]).
- Finally, the number of sufficient cores (N) to achieve the TargetRT[s] are calculated.
Methodology for the Concurrent Users Sizing of HANA Analytical Applications
The goal of this sizing is to determine how many cores are required for the processing of target number of parallel requests (TargetPR), and at the same time achieving the target response time (TargetRT[s]).
- The target number of parallel requests is always an integer value.
- The target number of parallel request is not the same as the number of concurrent users. Due to the pauses between the user requests from one user, during which the user reviews the data in the application UI (so called “think time”), usually the parallel requests are in range of 5-10% of the number of concurrent users.
- To fulfill the target response time TargetRT[s], a minimum portion of N cores should be available. This means that the minimum number of cores, required to handle one request is N cores and that the final result for required cores cannot be lower than N for any number of parallel queries.
UsageRatio = (TargetRT[s] – S[s]) / TargetRT[s]
Theoretically, the Target Response time (TargetRT[s]) is higher or equal to the Sequential Part (S[s]) and both are non-negative numbers. For this reason, the UsageRatio can be at maximum 1 (100%).
The meaning of UsageRatio would be explained with examples:
– Let’s assume that after applying the formulas, the result is UsageRatio=0.77 (77%). This would mean that N cores have capacity to handle 1 request at a time, but not enough capacity to handle 2 parallel requests, because 2*77% = 154% > 100%.
– If UsageRatio=0.37 (37%) N cores have enough capacity to handle 2 parallel requests, because 2*37%=74% < 100% but are not enough to handle 3 parallel requests, because 3*37%=111% > 100%.
RequiredCores = TargetPR + (N-1) * roundup_to_Integer (TargetPR * UsageRatio)
The customer determined the target response time.
The number of cores, N, to achieve the Target response time have been calculated (using Amdahl’s Formula) and it is, for example, 16.
The UsageRatio has been calculated and it is, for example, 0.37.
The table below demonstrates the number of required Cores for different system throughput (number of target parallel requests).
|N (input parameter, which is fixed value for concrete query)||UsageRatio (input parameter, which is fixed value for concrete query)||TargetPR (input parameter, which can vary if the customer wants to size different throughput)||Required Cores (sizing output)|
From the table is clear that for any target throughput the number of required cores is multipliable of N.
The result should be interpreted like this: ”35 Cores are sufficient to handle up to 5 parallel requests at ~85% average CPU allocation (calculated as 0.37 usage ratio * 5 requests * 16 cores / 35 cores) and are not sufficient to handle 6 parallel requests. A system of 51 cores can handle 6 parallel requests at ~70% average CPU allocation; 53 cores can handle 8 parallel requests at average ~90% CPU allocation”.
Practially, because of the close range of required cores, the customer can run 3,4, and 5 queries on 34 cores and 6, 7, and 8 queries on 52 cores, i.e. can ignore the difference of +1/-1 core with the penalty of a very minor deviation from the target response time.
Customers with higher hardware budget, could apply security factor (e.g. 30-35%) on top of the calculated hardware resources to achieve lower average resource consumption.