In the previous blog, we had discussed the workload pattern. This blog dissects how the CPU, threads and NUMA related topic will influence the workload, and how to figure out the most useful information through the workload analysis. The conclusion can be used for further tuning the HANA system into a good shape on CPU resources consumption.
How to Begin with CPU Utilization Analysis
The analysis of CPU utilization stands as a key element within the broader scope of workload analysis. Determining the right approach to analyze CPU usage is crucial. It's essential to first identify the type of CPU utilization that may be problematic—be it user CPU utilization within the HANA instance, system CPU utilization governed by the OS, or a combination of both. A clear understanding of this distinction is necessary to properly address the issue.
Should the user CPU utilization be significant, it would be prudent to delve into thread information to gain insights into ongoing processes. Conversely, if the focus is on system CPU utilization from the OS, gathering thread data alongside call stack histories is advisable. Collaboration with the OS Administrator might also be necessary to investigate any scheduled jobs or activities that coincide with peak usage times.
In this blog, our primary focus is on user CPU utilization, which is directly impacted by the workload within the HANA instance. A valuable starting point for gaining deeper insights is to analyze thread samples. The monitoring views pertinent to this are:
Although both views contain very similar data, for the purpose of workload analysis, we particularly concentrate on specific columns within these views:
HOST | VARCHAR(64) | Displays the host name. |
TIMESTAMP | TIMESTAMP | Displays the timestamp of the record. |
THREAD_TYPE | VARCHAR(128) | Displays the thread type. |
THREAD_METHOD | VARCHAR(256) | Displays the thread method. |
THREAD_DETAIL | NVARCHAR(256) | Displays the thread detail (truncated). |
THREAD_STATE | VARCHAR(32) | Displays the thread state. |
STATEMENT_HASH | VARCHAR(32) | Displays the unique identifier for an SQL string. |
ROOT_STATEMENT_HASH | VARCHAR(32) | Displays the MD5 hash value for the root statement string. |
USER_NAME | NVARCHAR(256) | Displays the SQL user name. |
APPLICATION_NAME | NVARCHAR(256) | Displays the name of the application. |
APPLICATION_USER_NAME | NVARCHAR(256) | Displays the application user name. |
APPLICATION_SOURCE | NVARCHAR(256) | Displays that the application can define which source file SAP HANA is called from. The usage is up to the application. This value is also displayed in M_PREPARED_STATEMENTS.APPLICATION_SOURCE. |
STATEMENT_THREAD_LIMIT | INTEGER | Displays the effective statement thread limit. |
STATEMENT_MEMORY_LIMIT | INTEGER | Displays the effective statement memory limit. |
PASSPORT_COMPONENT_NAME | NVARCHAR(32) | Displays the passport component name. |
PASSPORT_ACTION | NVARCHAR(40) | Displays the passport action. |
NUMA_NODE_INDEX | SMALLINT | Displays the last known NUMA node that the thread was executed on. |
WORKLOAD_CLASS_NAME | NVARCHAR(256) | Displays the name of the workload class. |
Considerations for CPU Utilization in HANA Workload Analysis
Now that we have the tools at our disposal, where should we initiate our analysis of CPU utilization? Addressing CPU usage within SAP HANA is multifaceted. A high CPU usage, peaking at 90% of total capacity—for instance, with 8 sockets, 208 CPU cores, and hyper-threading—suggests that approximately 374 threads (90% of 208 x 2 logical CPU cores) are active. This scenario often leads to system hang-ups and prevents new user connections. However, this is not the sole concern.
We also face perplexing situations where a mere 50% CPU utilization can cause similar system hang-ups, or times when the system appears idle yet experiences high system CPU utilization. Moreover, there are instances where a single statement monopolizes all threads despite workload classifications and global concurrency limits being in place.
To unravel these complexities, a thorough review of thread samples, supplemented by other monitoring views, is necessary for deeper insight into the system's behaviour.
Typically, the following factors can lead to CPU-related issues in a HANA system:
It's worth noting that high CPU usage doesn't inherently indicate a problem. At times, it may signify that the HANA system is effectively utilizing its capacity to handle the workload, which is not a concern. Our goal is to identify workload contributors to minimize CPU consumption and safeguard the system against potential future issues.
Analyzing CPU Utilization Issues: Two Practical Examples
In this section, we simulate two real-world scenarios to provide deeper insights into CPU utilization analysis in SAP HANA environments.
Before diving into the analysis of expensive statements, it's crucial to establish an understanding of the system's configuration that affects workload management. In the examples that we will explore, the system is equipped with :
Example 1 - CPU Spikes Caused by Expensive Statements
Expensive statements in a database context typically refer to queries that run longer than anticipated or consume a significant amount of memory during execution. In the realm of workload analysis, especially concerning CPU consumption, we focus on those statements that utilize a large number of running threads. In our example, various types of statements have been identified that could potentially cause extremely high CPU utilization. These could be so intensive as to lead to system hang-ups.
This situation might involve complex join operations, extensive data aggregation, or poorly optimized queries that put a heavy load on the CPU. Identifying and optimizing these expensive statements is crucial to prevent CPU spikes and maintain system stability. This process involves analyzing query execution plans, reviewing indexing strategies, and possibly restructuring or simplifying the queries themselves to reduce their resource demands.
In the given example, we observe distinct CPU utilization patterns over a span of two weeks on the HANA platform, with utilization nearing 100%. The first week’s pattern is characterized by abrupt spikes in CPU usage, quickly rising and falling, whereas the second week displays more frequent and enduring peaks.
TABLE_NAME | COLUMN_NAME | SCANNED_RECORDS | SCR_PER_S |
MATDOC | MATNR | 828290874021485 | 19304757728 |
MATDOC_EXTRACT | WERKS | 109487724317894 | 10032855641 |
The chart offers a detailed view of the running threads across NUMA nodes in a HANA system equipped with 8 sockets, each hosting 52 logical CPUs. Here’s a refined analysis of the patterns observed and their potential implications:
Utilizing Active/Active Read-Enabled (AARE) in SAP HANA can swiftly reduce CPU and memory usage on the primary site. By rerouting read operations to a secondary site—through hints in SQL, ABAP, Procedure or CDS views—resource consumption can be balanced without altering existing job logic. Monitoring the secondary site is vital to ensure it doesn't affect system replication performance. For details on AARE, refer to the official SAP documentation.
Workload Analysis for HANA Platform Series
This blog post is part of the 'Workload Analysis for HANA Platform Series'. In upcoming posts, we will demonstrate how to analyze the issue related to CPU, threads and NUMA Node . Here's what you can look forward to in this series:
Stay tuned as we explore these aspects in detail, providing insights and strategies to optimize your HANA environment.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
23 | |
14 | |
13 | |
13 | |
11 | |
10 | |
9 | |
9 | |
8 | |
8 |