Skip to Content

I am back with the comparative tests:  threaded kernel mode versus process kernel mode.  I have seen in  my earlier tests that process kernel mode seems to get slightly better throughput under the same workload.  On the other hand, I have also found that process kernel mode have shown certain types of unwanted behavior absent from the threaded kernel mode.

In order to verify these differences I decided to port the tests from the Solaris (SPARC) platform to Redhat Linux (Xeon). This way I will be able to test ASE behavior on different platforms: Solaris running on SPARC and RHEL running on Xeon.  The probability of an error will be minimized.

Having performed identical test, I have indeed been able to verify that the threaded kernel mode displays up to 10% more CPU utilization.  Behavioral lapses in the process mode too were persistent across platforms.

What I will report below is a series of 4 tests. Two of them performed on a SPARC host (32 logical processors) with ASE running in threaded and process mode, and another two performed on a XEON host (32 logical processors) with ASE running again in threaded and process modes.  The legend of the tests is identical to my previous posts, so I will not repeat it here.  I am still working around the issue of generating prepared statements at high rate.

SPARC_TESTS

XEON_TESTS

I will post only a limited number of graphs, those that looked most peculiar to me.  I will not comment on how differences in configurations affected the CPU utilization/throughput either (this too is found in earlier posts).

First the TPS/CPU Load:

SPARC:

TPS.SOL

XEON:

TPS.LX

ProcPS/CPU Load:

SPARC:

PROC_REQ.SOL

PC_MON.pr.solPC_MON.th.sol

XEON:

PROC_REQ.LX

PC_MON.pr.lxPC_MON.th.lx

Statement Cache:

SPARC:

ST_CACHED.SOLST_DROPPED.SOLST_FOUND_IN_CACHE.SOL

XEON:

ST_CACHED.LXST_DROPPED.LXST_FOUND_IN_CACHE.LX

Spinlock Contention:

SPARC:

PCACHE_SPIN.SOLSSQLCACHE_SPIN.SOL

XEON:

PCACHE_SPIN.LXSSQLCACHE_SPIN.LX

There are quite a few interesting things.  The same things were found to be working less well across platforms (such as sp_monitorconfig that stopped reporting utilization at certain time, on either RHEL or SPARC, or PC/SC spinlock contention that were consistently higher in the process mode – even with TF758).  What has been pretty surprising is the huge throughput leap in porting the tests from 8 chip rather old SPARC VI host (16 cores, 32 threads running 2.15 GHz) to 2 chip XEON 2600 host (16 cores, 32 threads running 2.7 GHz).  I will check SPARC VII (2.66 GHz 4-core) and T4 (2.85 GHz 8-core) chips later on to have more data.  But the throughput difference found here is something worth digging into.

Anyway, even with the latest tests there still remained a question:  do we really see greater thread utilization due to a higher throughput that ASE achieves running the threaded mode OR the higher thread utilization brings about a lower throughput instead.  In order to test this I had to modify slightly my tests so that I get more precise recording of the volume of work the server does in each test.

This has brought me to the following numbers (I have tested only the RHEL host – more tests to come):

THROUGHPUT.LX

The 100% utilization corresponds to the threaded kernel mode – we have slightly lower number of code loops (I compare here an average number of loops code in each client connection performs for the same period of time).  The threaded mode gets an average of 240 loops with 100% thread utilization, while the process mode gets 260 loops with 90% engine utilization.  This is not much, but the difference is there.  What is also interesting is the following:

Process Mode:

CODE_LOOPS.pr.lx

Threaded Mode:

CODE_LOOPS.th.lx

Each bar in the graph corresponds to a client connection executing its code (identical).  Whereas in the process kernel mode the throughput deviation between each client may vary (200 to 300!), for the threaded kernel mode this deviation does not exist (within the range of 10 – which is explained by the fact that the threads are activated serially).

So it seems to be true:  the threaded kernel mode does get to a slightly higher thread utilization without an increase in throughput. The difference is not significant, but it is there. On another hand, the threaded mode yields much more stable performance – both in terms of query response time and across many other counters. If I had to choose, I’d choose the threaded kernel mode over the process mode – based on its performance characteristics (probably hosted on Linux- but I have to check this aspect more thoroughly before I comment on it with any degree of reliability).  If you consider that some process mode clients has suffered ~15% drop in throughput in comparison to the threaded mode clients, the 35% improvement other process mode clients received is somewhat compromised.

My last tests before I leave this topic will be focusing on whether the throughput rises when more threads are brought online for the strangled ASE, thus reversing the balance towards the threaded mode.  I will check this across both RHEL and SPARC hosts to have more sanitized data and report on it here.

ATM.

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply