Skip to Content
Author's profile photo Stefan Koehler

[Oracle] Profiling and troubleshooting Oracle related CPU performance issues on Linux with perf (perf_events)


In some of my previous blog posts i demonstrated some “advanced” Oracle (performance) troubleshooting techniques by using system call and call stack tracing. One blog post included a real life client issue (CPU was fully consumed) and a guide how to drill down such problems systematically.

However this “simple / manual” oradebug / pstack approach might work (as previously demonstrated with Oracle bug #13641076), but sometimes you need some profiling tool that samples and measures the CPU consumption (within a call stack), if it gets more complex or the application / kernel stack changes frequently by facing the same issue. One of my clients hit such a scenario a few months ago as i was on-site and in consequence we needed to know which Oracle application (kernel) function consumes all the CPU time. I am not able to post the exact details of this particular issue due to a confidential agreement, but the posted example below should demonstrate the usefulness of the tool perf (perf_events) in any case.

Perf / Perf events

Perf (sometimes “Perf Events”, originally “Performance Counters for Linux”) is a performance analyzing tool in Linux. It is available with kernel version 2.6.31 or higher. Performance Counters for Linux (PCL) is a kernel-based subsystem that provides a framework for collecting and analyzing performance data. The PCL subsystem can be used to measure hardware events, including retired instructions and processor clock cycles. It can also measure software events, including major page faults and context switches. For example, PCL counters can compute the Instructions Per Clock (IPC) from a process’s counts of instructions retired and processor clock cycles. A low IPC ratio indicates the code makes poor use of the CPU. Other hardware events can also be used to diagnose poor CPU performance. Perf itself is based on the perf_events interface exported by recent versions of the Linux kernel.

The usual used option (for Oracle processes) called “perf record” is based on sampling or better said perf_events is based on event-based sampling in such cases.

The perf_events interface allows two modes to express the sampling period:

  • The number of occurrences of the event (period)
  • The average rate of samples/sec (frequency)

The perf tool defaults to the average rate, which is set to 1000Hz, or 1000 samples/sec. That means that the kernel is dynamically adjusting the sampling period to achieve the target average rate. In contrast, with the other mode, the sampling period is set by the user and does not vary between samples. In my experiences the default setting is sufficient for troubleshooting Oracle issues in most cases, however you can adjust the sampling rate with option “-c” (count of event between the samples will be taken). For example “-c 1” means that every CPU event is sampled, which is accurate but can lead to a huge overhead.

Unfortunately the documentation of perf / perf_events is very bad and so i have included some unofficial web sites about it in the reference section. I also mentioned this tool several times on SCN (e.g. Problem with oracle – even it does not help in his specific case), if you need to profile a CPU performance issue in detail.

You have to be very careful with the event (hardware / software) due to possible incorrect measurements, if you want to use perf in a virtualized environment (e.g. VMware is able to provide the hardware event information in the VM – check the reference section for more information).

Frits Hoogland has written a blog post about this called “When the Oracle wait interface isn’t enough, part 2: understanding measurements” (check the reference section for more information), which can be summarized in this way.

If no specific event is specified, perf tries to use ‘cpu-cycles’, which has the indication [Hardware event], which means the kernel’s performance registers are used to gather information. If this is not possible (because virtualization disables access to the performance registers), the software event ‘cpu-clock’ can be used. However, cpu-clock is a software event and this event is depended on the timer interrupt.

shell> perf list | grep cpu
          cpu-cycles OR cycles                       [Hardware event]
          cpu-clock                                          [Software event]

The demo

The following demo was run on an Oracle database ( on OEL 6.4 (2.6.39-400.109.1.el6uek.x86_64). I am restricted to the cpu-clock event due to the fact that my OEL 6.4 runs as a virtual machine on Oracle Virtual Box 4.2.12, but let’s ignore this fact for demonstration purpose here.

The SQL example itself is pretty simple as it just joins the same table by a hash join which leads to a high CPU usage in consequence. Let’s assume that this would be a problem and you want to profile the CPU consumption of this process as it running on CPU only (e.g. verified by Oracle wait interface before).

shell> cat /proc/sys/kernel/perf_event_paranoid
*** Security barrier of perf
* perf event paranoia level:
*  -1 - not paranoid at all
*   0 - disallow raw tracepoint access for unpriv
*   1 - disallow cpu events for unpriv
*   2 - disallow kernel profiling for unpriv

shell> ps -ef | grep oracleT11DB
oracle    2014  2013  0 18:09 ?        00:00:00 oracleT11DB (DESCRIPTION=(LOCAL=NO))

SQL:PID 2014> create table MYTAB as select * from DBA_SOURCE;
SQL:PID 2014> select /*+ use_hash(b) */ count(*) from MYTAB a, MYTAB b where a.owner = b.owner;


shell> perf record -e cpu-cycles -o /tmp/perf.out -g -p 2014

shell> perf report -i /tmp/perf.out -g none -n --stdio


Footnote: cpu-clock is used instead of the provided hardware event option “cpu-cycles” due to the previously mentioned Virtual Box limitations

shell> perf report -i /tmp/perf.out -g graph -n --stdio


Footnote: cpu-clock is used instead of the provided hardware event option “cpu-cycles” due to the previously mentioned Virtual Box limitations

You can drill down the CPU consumption of the process to each Oracle (kernel) application or OS kernel function. The application functions (e.g. qerhjWalkHashBucket or kxhrPUcompare) are marked with “[.]” in contrast to OS kernel space functions (e.g. acpi_pm_read or native_read_tsc) which are marked with “[k]”.

So for example you can interpret that 25.02% of sampled CPU time was spent in the Oracle (user-land) function kxhrPUcompare. This function was invoked by qerhjWalkHashBucket which consumed 24.01% from the overall 25.02% of sampled CPU time (as a stack) and function qerhjInnerProbeHashTable which consumed 1.01% from the overall of 25.02% of sampled CPU time (as a stack). If you look closely at the last output (parameter “-g graph”) you may notice that it looks like a call stack trace, but now you have the CPU consumption information as well. The output and CPU consumption makes absolutely sense in relation to the SQL execution plan. You also can get the absolute percentage values within a function call stack, if you use the parameter option “-g fractal” instead.

The Oracle (kernel) application functions are cryptic, but some can be translated with the “famous” MOS ID #175982.1 (check the reference section for more details). The information was deleted by Oracle some time ago, but it still can be found in the world wide web very easily.

For example function opifch2 (= OPI Oracle server functions. These are at the top of the server stack and are called indirectly by the client in order to server the client request) or function qertbFetch (= qertb – table row source / fetch) can be translated.

Footnote: Oracle also published and deleted MOS note #22241.1, which describes the naming convention for the x$ tables as well. Check the reference section for more details.


Perf is a perfect extension to the Oracle wait interface and easy to use with newer kernel releases. You can get a deep insight into the Oracle application or OS kernel stack by sampling the CPU usage of a specific process.

It can become a life-saver in critical situations where you need an urgent analysis / solution without walking through the whole vendor support process (which should be done afterwards for getting a possible fix for this issue of course).

In consequence it would be great, if the Oracle wait interface could be combined with perf, right? The answer should be yes and luckily such tools already exist. Craig Shallahamer and Frits Hoogland developed a script called “fulltime” recently. You can view a demo and download it from Craigs website here.

Of course perf / perf_events can also be used to profile any other application (e.g. SAP application sever kernel) on Linux.

If you have any further questions – please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database issues.


Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.