[Oracle] A deeper insight into stack tracing / sam...

stefan_koehler · ‎06-06-2013

Introduction

I am currently working on my blog backlog or on questions / requests of my followers and it seems that one of my previous blogs [Oracle] Advanced (performance) troubleshooting with oradebug and stack sampling raised similar questions about the stack tracing / sampling method.

The two most recurring questions about the last part ("3. Using stack traces or system call / signal traces") of this blog are similar to the following ones:

What is the difference between a call stack trace and a system call trace?
How do you know that the function "ktspscan_bmb" was the problematic one and not the previous recurring functions like "ksedsts, ksdxfstk or ksdxcb and so on"?

.. so this blog will be about the details of stack tracing / sampling of Oracle database processes on Linux operating systems (in my case Oracle Enterprise Linux with unbreakable enterprise kernel 2.6.39-100.7.1.el6uek.x86_64) and how the different stack sampling methods can influence the output.

Question 1: "What is the difference between a call stack trace and a system call trace?"

The best starting point is the (official) documentation before demonstrating the difference between a call stack trace and system call trace.

System call

The system call is the fundamental interface between an application and the Linux kernel.

System calls and library wrapper functions

System calls are generally not invoked directly, but rather via wrapper functions in glibc (or perhaps some other library). For details of direct invocation of a system call, see intro(2). Often, but not always, the name of the wrapper function is the same as the name of the system call that it invokes. For example, glibc contains a function truncate() which invokes the underlying "truncate" system call.

Often the glibc wrapper function is quite thin, doing little work other than copying arguments to the right registers before invoking the system call, and then setting errno appropriately after the system call has returned.

Note: system calls indicate a failure by returning a negative error number to the caller; when this happens, the wrapper function negates the returned error number (to make it positive), copies it to errno, and returns -1 to the caller of the wrapper.

Sometimes, however, the wrapper function does some extra work before invoking the system call. For example, nowadays there are (for reasons described below) two related system calls, truncate(2) and truncate64(2), and the glibc truncate() wrapper function checks which of those system calls are provided by the kernel and determines which should be employed.

Call Stack

A call stack is the list of names of methods called at run time from the beginning of a program until the execution of the current statement.

A call stack is mainly intended to keep track of the point to which each active subroutine should return control when it finishes executing. Call stack acts as a tool to debug an application when the method to be traced can be called in more than one context. This forms a better alternative than adding tracing code to all methods that call the given method.

So the main difference (in our context) is that the call stack trace includes the called methods or functions of an application (Oracle process) and that the system call trace includes only the (function) requests to the operating system kernel.

Let's demonstrate the difference with a tiny example by running a simple SELECT statement, that reads a lot of data from disk. I will use strace (for tracing the system calls) and pstack (which is a wrapper script for gdb for tracing the call stack) as the Oracle database is running on Linux. I am not using oradebug (for tracing the call stack) right now, because of oradebug is behaving differently by grabbing the call stack (more details about this are included in the answer of question 2).

SYS@T11:133> create table READTEST as select * from DBA_SOURCE;

SYS@T11:133> insert into READTEST SELECT * FROM READTEST;

SYS@T11:133> insert into READTEST SELECT * FROM READTEST;

SYS@T11:133> alter system flush buffer_cache;

shell> ps -fu orat11 | grep LOCAL=NO

orat11    1690     1  2 12:00 ?        00:00:02 oracleT11 (LOCAL=NO)

Let's run a SELECT statement in client session 113 with Oracle shadow process pid 1690 on table READTEST now. In parallel i will run strace with option "-cf" (to get a summary of the sys calls at the end) and do the same procedure with pstack on process id 1690 afterwards. Be careful - pstack will have no backtrace information, if you run strace and pstack at the same time on the same pid.

SYS@T11:133> alter session set "_serial_direct_read" = TRUE;

SYS@T11:133> select count(*) from READTEST;

Output pstack (samples of the stack)

Output strace (summary of sys calls)

Now you can clearly see the difference between a system call trace and call stack trace. The system call trace (strace) is missing the whole oracle functions (= application implementation). You just see the used kernel functions like gettimeofday (= get time) or pread (= read from a file handle/descriptor), but not the application code or function that invoked them.

Otherwise you see the whole call stack (including possible system calls) with pstack and which (C-)functions are called in the corresponding order.

A call stack needs to be read bottom up - regarding the previous stack trace example (a C-program always starts with the main function):

main -(calls)-> ssthrdmain -(calls)->opimai_real -(calls)-> sou2o -(calls)-> opidrv -(calls)-> … -(calls)-> gettimeofday (the last function is the currently executed code part of the program)

Question 2: "How do you know that the function "ktspscan_bmb" was the problematic one and not the later recurring functions like "ksedsts, ksdxfstk or ksdxcb and so on"?

Before i start to explain and demonstrate it - let's re-add the corresponding stack traces of my previous blog.

The question is pretty meaningful as i previously mentioned "the last function is the currently executed code part of the program". So the assumption that the function "ksedsts" is the problematic one is pretty close, but i marked the function ktspscan_bmb as the indicator. So how do come to this conclusion?

The answer is hidden in the implementation of oradebug. It sends a signal (SIGUSR2) to the corresponding process, if you issue a "short_stack" trace request. The function sspuser() is the handler for signal SIGUSR2 and runs further code (depending on the request) or in other words "everything to the left / above of function sspuser() is caused by dumping the stack via oradebug and not relevant for troubleshooting". In consequence you will alter the usual code path of Oracle, if you run a "short_stack" strace with oradebug. OS tools behave differently as they usually suspend the process and dump the stack. You will not find any special code path in such cases.

Let's do a short demonstration with oradebug (stack_trace and system state dump) and pstack (gdb) to see the difference.

I just connect to the Oracle database with SQL*Plus (Oracle shadow process pid 10676) and run an oradebug stack_trace (from a different session) and pstack on this OS pid. (the session is just in idle mode and waiting for input by SQL*Net).

As you can see oradebug has altered the code path for dumping the stack trace. The OS tool (in my case pstack / gdb) does not need to do this and just dumped the call stack. As a last example let's run a system state dump (with call stacks) for cross-checking the code path in such cases.

SYS@T11:16> oradebug setmypid

SYS@T11:16> oradebug dump systemstate 257

The following screenshot is just an extract of the whole trace file, but you can always find the corresponding modified code path of Oracle (except for the process that initiated the system state dump).

Summary

I hope that this blog clears the doubts and questions about call stack tracing and system call tracing. There are some risks by getting call stack traces as well, but Tanel Poder has already written several blogs about the different possibilities and impacts. Please check out his blogs in the reference section, if you want to know more about the risks on several operating systems and how to avoid them (when possible).

If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database (performance) issues.