Diff Threads Call Stacks from SAP HANA Runtime Dumps Using SAP HANA Dump Analyzer
When we troubleshoot SAP HANA issues based on its runtime dumps, we are often faced with this question: was the HANA system completely stuck, or was it still processing queries, requests, etc.? Answering this question based on a single HANA runtime dump might be difficult because, even though there were many active threads, some were running and some were waiting for synchronization, etc., it’s not sufficient to conclude if the HANA system was under heavy load or if it was already completely blocked. Now, with the introduction of “Stack Diff” feature in the SAP HANA Dump Analyzer, we have the ability to compare (“Diff”) the call stacks between two HANA runtime dumps taken at different point in time. In this blog, we will explain how to “Diff” the two HANA runtime dumps, and how to read the Stack Diff result. As a pre-requisite, we recommend that you go through the SAP HANA Dump Analyzer blog post first which gives an introduction on call stack flame graph, etc..
Stack Diff Flame Graph
To use the Stack Diff from SAP HANA dump analyzer, you can perform the following steps:
- Open SAP HANA Dump Analyzer, and load the first HANA runtime dump
- Click the “Expert Mode” tab
- Click the “Diff” button, and then load the second HANA runtime dump from the file picker
- Then the Stack Diff Flame Graph will be opened in a browser window
Fig.1 SAP HANA dump analyzer
We have prepared an example to showcase the “Stack Diff” feature, here you can see two HANA runtime dumps we collected, namely “dump1.trc” and “dump2.trc”.
Fig.2 Example of Dump1 and Dump2 to be Compared
- Stack Diff = For each stack ((stack sample count in dump2.trc) – (stack sample count in dump1.trc))
- If Stack Diff = 0: The stack samples exist both on dump1.trc and dump2.trc, the stack samples are shown in stack diff graph, no highlight.
- If Stack Diff > 0: The stack samples are more added in dump2.trc than dump1.trc. The newly created stack samples are highlighted in green.
- If Stack Diff < 0: The stack samples are (all/partly) disappeared in dump2.trc compared to dump1.trc. The disappeared stack samples are highlighted in red.
Fig. 3 Stack Diff Example 1
To provide a more meaningful example, the Stack Diff Flame Graph in Fig.4 demonstrates a blocking situation in HANA system: there are many incoming requests to HANA, but they are not being processed fast enough.
In SAP HANA context:
- SQL Executors are threads, which are responsible for normal SQL request processing. Threads whose call stacks contain function “ptime::TcpReceiver::doWork” are SQL executor threads.
- Job Workers are threads, which are responsible for processing parallelized OLAP load and internal activities like savepoints or garbage collection. Threads whose call stacks contain function “Execution::Thread::runjob” are Job Worker threads.
When there’s an incoming OLAP request, a SQL Executor thread receives the request and spawns Job Workers threads for parallelized processing. From the animation in Fig.4, we can observe that:
- A large number of SQL Executors are created as there are many incoming requests on the HANA system
- The number of Job Workers is nearly the same, which indicates that there are not many Job Workers created.
This Stack Diff Flame Graph shows that the HANA system is having a request processing problem: it receives more requests than it’s able to process. In this particular case, the HANA parameter MAX_CONCURRENCY, which defines the maximum number of logical CPUs consumed by Job Workers, is not configured properly. After increasing this number, the requests can be processed sufficiently.
Fig. 4 Stack Diff Example 2