ASE 15.7: THREADED VS. PROCESS KERNEL MODE.

Former Member · ‎02-11-2013

So, the threaded or the process? Which one would you select on your next migration?

There is a clear recommendation in Sybase documentation to use the threaded kernel mode. This recommendation, though, is precisely the opposite of what Sybase ASE DBAs would have naturally selected. I have spoken to several customers that consider to move to 15.7. Most feel more comfortable moving to 15.7 running in process kernel mode. Little wonder: we are used to ASE running this way. For those who struggled with new optimizer upgrading from 12.x to 15.x to face yet another revolution in ASE code looks formidable.

And yet, setting worries aside, perhaps there are good reasons we are pushed into the new kernel? I am not talking about the improved way the threaded kernel mode treats IO requests (which is publicized by the official documentation) and more steady query performance. I am talking about things less straightforward. Is not the process mode a sort of a "compatibility mode" for the 15.7 new kernel version - similar to the "compatibility mode" available for the 15.x new optimizer version. The official "compatibility mode" was never really recommended for use. It was an option to choose only if there is absolutely no way to let the new optimizer do its work (incidentally, it is rather curious why this mode should have been though of at all: if the new optimizer makes more intelligent decisions, why preserving something less optimal in the same code). What about the process mode kernel? Is this option really an alternative to the "default" threaded kernel mode (and if it is, say for certain HW platforms, why is it not clearly stated as such)?

I have set up a test to check. In fact, this is not a vane intellectual foible. I am facing a real migration project in the very close future which will involve moving into 15.7 and which will also involve making a tested decision as to which kernel mode to choose. If I heed to recommendations - I should choose the threaded mode. If I heed to the DBA vox populi - I would choose the process mode (threaded mode code is quite new, less than 5 years of real customer experience - how many real issues were found and fixed, if at all?).

I have the following settings for the test. ASE 15.7 ESD#2 (I'll move on to #3 & #4 later on), running on Solaris x64 VM environment (not very optimal, but since I compare rather than analyze, it is no as bad). Really small ASEs. ~2GB RAM, no separate cache for tempdb. Relatively small procedure cache and statement cache. 3 engines or threads out of 4 VM cores available (2 physical chips). Something that may be setup and tested by anyone. I am generating load using very simple code JAVA run from the same host. The client code selects a couple of rows from a syscomments - each time dynamically generating unique select statement. I will deliberately submit these requests as prepared statement. "Deliberately" because JAVA community loves prepared statements and will continue to use them indiscriminately whether this is a waste of resources or not. "Deliberately," also, because I know this code has a strong potential to destabilize ASE 15.x. What I want to check is whether 15.7 version is more stable for this type of "bad" client code than it's previous 15.x variations. I want also to check which kernel mode performs better - if there is a difference at all. At last, I want to check which of ASE/JDBC configuration parameters should be used/avoided in for this type of code?

Below are the configuration settings (and times) which are behind the graphs that I site throughout this paper.

For the threaded mode kernel:

"14:56"	S157T_2LWP_DYN1_ST0_STR0_PLA0_ST0M
"14:58"	S157T_2LWP_DYN1_ST0_STR1_PLA0_ST0M
"15:01"	S157T_2LWP_DYN1_ST0_STR1_PLA1_ST0M
"15:03"	S157T_2LWP_DYN0_ST0_STR0_PLA0_ST0M
"15:06"	S157T_2LWP_DYN0_ST0_STR1_PLA0_ST0M
"15:08"	S157T_2LWP_DYN0_ST0_STR1_PLA1_ST0M
"15:12"	S157T_2LWP_DYN1_ST1_STR0_PLA0_ST20M
"15:14"	S157T_2LWP_DYN1_ST1_STR1_PLA0_ST20M
"15:17"	S157T_2LWP_DYN1_ST1_STR1_PLA1_ST20M
"15:19"	S157T_2LWP_DYN0_ST1_STR0_PLA0_ST20M
"15:21"	S157T_2LWP_DYN0_ST1_STR1_PLA0_ST20M
"15:24"	S157T_2LWP_DYN0_ST1_STR1_PLA1_ST20M

For the process mode:

"12:41"	S157P_2LWP_DYN1_ST0_STR0_PLA0_ST0M
"12:44"	S157P_2LWP_DYN1_ST0_STR1_PLA0_ST0M
"12:47"	S157P_2LWP_DYN1_ST0_STR1_PLA1_ST0M
"12:51"	S157P_2LWP_DYN0_ST0_STR0_PLA0_ST0M
"12:54"	S157P_2LWP_DYN0_ST0_STR1_PLA0_ST0M
"12:56"	S157P_2LWP_DYN0_ST0_STR1_PLA1_ST0M
"13:02"	S157P_2LWP_DYN1_ST1_STR0_PLA0_ST20M
"13:05"	S157P_2LWP_DYN1_ST1_STR1_PLA0_ST20M
"13:07"	S157P_2LWP_DYN1_ST1_STR1_PLA1_ST20M
"13:10"	S157P_2LWP_DYN0_ST1_STR0_PLA0_ST20M
"13:13"	S157P_2LWP_DYN0_ST1_STR1_PLA0_ST20M
"13:15"	S157P_2LWP_DYN0_ST1_STR1_PLA1_ST20M

A bit cryptic but what the table above specifies is that we have 2 LWP generating processes, running with the JDBC setting of DYNAMIC_PREPARE set to true or false {DYN1/DYN0}, with the statement cache either turned on or off on the connection level {ST1/ST0}, with the streamlined sql options set on or off {STR0/STR1}, with the plan sharing option turned on or off {PLA0/PLA1} and with the statement cache turned off (0 MB) or configured at 20MB {ST0M/ST20M}. S157P is Sybase 15.7 process mode. S157T is Sybase 15.7 threaded mode.

Again, these are preliminary tests. I will be performing similar test on a large SPARC server tomorrow, so more data will be available with time and more insights. Insights may change. For now I post only things discovered in this simplistic, home-grown "lab".

So here are the numbers: we are running an endless loop creating unique select statements from JAVA client. We always submit them as prepared statements, but we change settings: first we run with statement cache turned off and the "functionality group" parameters turned off. We turn each of these on and off, than do the same with submitting the statements with DYNAMIC_PREPARE set to false. Then we run the same tests again but with the statement cache configured 20MB.

CPU load:

Process mode:

Threaded mode:

I don't really know what is the throughput here, but I know that the same code generates slightly greater load on ASE running the threaded kernel (either because it has a greater throughput or because it has a greater weight). An average for the process mode stands on 28% CPU, while on the threaded mode it gets to 32%. The same is true for peaks: 35% vs. 42%. What was the throughput? We will compare TPS and procedure requests per second (actually, both TPS and PPS are the result of compiling the prepared statements we send to the ASE into LWPs - we don't have any DML statements in the code and we do not execute procedures either - so this is ASE's way to handle our code).

TPS in process mode:

TPS in threaded mode:

TPS is more or less the same. What about procedure requests per second?

Process:

Threaded:

Actually, the process mode has generated 200 more procedure requests per second. This is the speed at which the code is executed (the lacuna in graph is when both the statement cache and the DYNAMIC_PREPARE setting are turned off).

In general, the behavior is the same but there are certain unexpected differences. From right to left: we start with statement cache turned off (and set statement_cache off in the code). We get ~700 SPS (SP executions per second). We turn the streamlined option on - the throughput goes twice up to ~1400 SPS, plan sharing seems to add a bit more. We turn the DYNAMIC_PREPARE off - no LWPs. Then we do the same with statement cache turned on on ASE (and set statement_cache on in the code). The behavior is similar, except that when we set DYNAMIC_PREPARED to off - we again generate LWPs at high rate (incidentally, CPU fares worst in both cases when the prepared statements generated by the code meet the DYNAMIC_PREPARED = false JDBC setting). And with the statement cache set to zero the CPU load is slightly lower for this type of code.

The statement cache data (statements cached per second from sysmon & number of statements in the cache from the monStatementCache).

Process mode:

Threaded mode:

More or less similar, with some anomaly.

So far, it seems there is no great difference. Moreover, the process mode fares a bit better.

However, this is what we see when we monitor the procedure cache usage with sp_monitorconfig.

Process mode:

Threaded mode:

Oups. There seems to be a problem here. With the ASE running in the process kernel mode, procedure cache gets exhausted as soon as we run prepared statements and turn the DYNAMIC_PREPARE off for own JDBC client. The only way to reclaim space in the procedure cache for the process mode kernel is.... to bounce the server. Urgh. Pretty bad. Procedure cache in the process kernel mode is very sensitive and seems to be wasted on something.

There is also a difference with compiled plan treatment in the two modes:

Process mode (query plans spinlock contention):

Threaded kernel:

This spinlock contention arises only when the statement cache is configured and the statements are submitted with DYNAMIC_PREPARED set to true in the threaded mode. In process mode it climbs all the way up to 20%.

Process mode - statements recompiled due to plans being flushed out of the cache:

Threaded mode:

For some reason, statements are recompiled almost twice more frequently in the process kernel mode.

To throw a stream of optimism here: procedure manager spinlock contention. This was the nasty spinlock that wrought real havoc in previous 15.x versions (and brought at least one upgrade attempt down). This one now is pretty well treated in 15.7.

Process mode:

Threaded mode:

But this too, seems to be treated a bit better in the threaded kernel than in the process kernel mode.

So, what have we learned from this simple test, if at all? The process kernel mode seems to give more steady throughput for this kind of code: lower CPU load, higher proc/sec ratio. At the same time, in process kernel mode, procedure cache functions very badly: sp_monitorconfig either does not report correct data or reports on procedure cache being exhausted if we run prepared statements and set JDBC to DYNAMIC_PREPARE = FALSE. Moreover, there is no way to cleanup the procedure cache mysteriously wasted. Even when all the activity in the server is halted and the procedure cache is cleaned up using the available dbcc commands, it remains 99% full. I did not test the same issues with later ESDs, but I have seen that on SPARC platform too, process mode seems to yield more steady throughput, but often at the price of getting into issues that the threaded kernel does not have (I hit stack traces on adding engines online under heavy stress, the procedure cache reported zero utilization by sp_monitorconfig; in threaded kernel mode resource memory was dynamically adjusted by ASE at startup, in process kernel mode ASE failed to online engines if the kernel resource memory was missing &c).

So process or threaded kernel, which one would you choose?

Tomorrow I will be running similar tests on 15.7 ESD#3 running on SPARC. I will test to see if similar issues arise there.

To be continued....

ATM.

ASE 15.7: THREADED VS. PROCESS KERNEL MODE.

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z