System debugging and analysis techniques for ASE on Unix/Linux – Part 4
Blog #4 – System call tracing
A. When to use these tools
While ASE does most of its query processing work without using many system calls there are times when the interaction with the operating system which is done through those system calls may be an issue. The tools I’ll discuss in this blog allow us to take a look at those calls and trace what is going on. Some examples of when this tracing might be useful are:
1) When you see repeated errors in the ASE log that are related to system functions
2) When you suspect that there may be a performance issue caused by slow handling of system calls by the OS.
3) When ASE is having problems booting up and the cause is not clear from the errorlog.
B. What to look for
There are two primary output formats that we will be looking at. One is the full output, where the system call, the parameters to the system call, and the returned value from the OS are shown. The second is a summary of the system calls that have been made, time spent in them, and number of errors. What you want to look at will depend upon the problem you are trying to solve; but here are the general formats for the two output types.
Full output example:
32664 open(“/lib64/libpam.so.0”, O_RDONLY) = 3
Here the first column (32264) is the dataserver process id. The system call is open(), and the parameters are the name of the file (/lib64/limpam.so.0) and the type of open being done (read only). The return value comes after the equal sign (3) and in the case of open is the file descriptor number assigned by the OS.
An example of a failing system call:
32667 stat(“/work/ase157/ASE-15_0/certificates/dputz157b.txt”, 0x1495b6c20) = -1 ENOENT (No such file or directory)
Here the pid , system call, and parameters are in the same place, but we see that the OS returned a value of -1 (which is commonly used for errors) and the error type was ENOENT – no such file or directory. If this server had intended to have a SSL certificate this would be a problem; in this case SSL has never been set up and so we would expect no certificate to be found.
Summary of system calls example:
# truss -c -p 1234 (where 1234 is a dataserver process id)
signals ————
SIGALRM 189
SIGUSR2 7
total: 196
syscall seconds calls errors
write .197 1559
open .073 230
close .002 12
brk 4.143 86656
fstat .254 6823
kill .040 623
ioctl .247 4159
fcntl .025 687
lwp_unpark .000 7
getmsg .101 2230 209
putmsg .075 241
poll .116 2125 2
sigprocmask 2.043 64476
setcontext .000 7
sigtimedwait .043 189
kaio .006 22
kaio .017 533
——– —— —-
sys totals: 7.389 170579 211
usr time: 1.216
elapsed: 18.860
The first column is the name of the system call, the second is the number of seconds spent in that call (that is, the time taken to process the call by the OS), the third is the number of calls, and the last column is the count of errors returned to that call.
Note in this example the high percentage (4.143 out of 7.389 seconds) used for the brk() call. This was actually due to a bug in ASE 12.5.4 caused by a change made, and corrected by CR 480690.
C. Specific tools
The name of the program used to generate system call traces varies by platform. On Solaris and AIX it is called “truss”, on Linux it is “strace” and on HPUX it may be named either “tusc” or “truss”, depending on what version of HPUX you are running. The syntax varies slightly for each one, but in general the options are:
- Attach to a running program (typically the “-p” option)
- Generate the system call summary report (typically the “-c” option)
- Trace any child processes
- Print a timestamp on each line
- Only select certain specified system calls to trace
You should check the documentation for your system to get complete details on the options available.
D. Examples
1. ASE won’t boot, and the following messages are seen in the errorlog at startup time:
…
00:0000:00000:00000:2014/06/30 14:12:26.83 kernel os_find_good_keyfile: Could not open file /work/dputz/ase157/ASE-15_0/dputz157b.krg: File exists
00:0000:00000:00000:2014/06/30 14:12:26.83 kernel kbcreate: couldn’t get shmid for kernel region.
00:0000:00000:00000:2014/06/30 14:12:26.83 kernel kistartup: could not create shared memory
This doesn’t make a lot of sense on the surface, as it seems to be saying that it can’t open a file because it exists. So, let’s start ASE using strace (since this is on a Linux system) using the following command:
strace –f –o /tmp/trace.out RUN_servername
In this case we want the full output and we’ll use the “-o” option to send the output to a file. This is usually a very good idea, as the trace programs can generate a *lot* of output. We’ll also use the “-f” option to follow all forked processes. If that was not included all that we would get would be the output from the RUN_servername shell script, which would not be what we need for this issue.
After the bootup fails again, examining the /tmp/trace.out output shows the following two lines:
…
16567 open(“/work/ase157/ASE-15_0/dputz157b.krg”, O_RDWR) = -1 EACCES (Permission denied)
16567 open(“/work/ase157/ASE-15_0/dputz157b.krg”, O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EEXIST (File exists)
…
A good hint here is that very often the “interesting” lines in a trace will be near the end of the output file. In this case, we knew from the errorlog that an open() call was failing, so we would look for the open() calls near the end of the file.
There are two open calls that returned an error here – the first one when ASE tried to open the file /work/ase157/ASE-15_0/dputz157b.krg for read/write (O_RDWR) and the second open when it tried to create a new file by that name (O_CREAT). (If you want more information on the parameters for any particular call the Unix “man” utility can be of great help. In this case we could run “man open” and find out what those parameters stand for.) So, we see in the first open() call a return of EACCES – permission denied. Why would ASE not have permissions to open that file, which it normally creates every time it boots? We can run an “ls –l” on it to see what the current owner and permissions are. The output looks like:
bash-4.1$ ls -l dputz157b.krg
-rw——-. 1 root root 57 Jun 30 14:11 dputz157b.krg
This shows us what the problem is. Someone has started ASE while they were running as the root user (not a very good idea, usually). So, the file is now owned by root and the permissions do not allow any other users to read the file, or create a new file with the same name. That is why the two open() system calls failed, and why the boot of ASE was unsuccessful. The remedy in this case would be to have a root user remove that file, which will allow ASE to boot normally.
2. ASE is performing slowly, and general tuning makes no difference
Here we ran “truss –c” to get a look at the overall system call distribution and timing.
truss -c -p 12347
^Csignals ————
SIGALRM 1538
SIGUSR2 6
total: 1544
syscall seconds calls errors
close .000 1
kill .175 8193
ulimit .000 3
lwp_park .008 155
lwp_unpark .005 155
setcontext .000 6
sigtimedwait .095 1538
lwp_sigmask .000 6
kaio .007 73
kaio .017 780 613
pollsys 1.811 85665 6
getrlimit64 .000 3
pwrite64 .039 155
send .000 1
——– —— —-
sys totals: 2.159 96734 619
usr time: 1.891
elapsed: 153.890
The syscall column shows the system call names, if you are unsure about what any of them do you can again use the Unix “man” command to find out.
We see here that there looks to be a large number of “pollsys” calls. These are a result of ASE doing poll() calls to check for completed async I/O. But the number of actual I/O calls (kaio) is not high. This tells us that ASE is spending a lot of time polling for disk I/O to complete. We could at this point perhaps use iostat (see my second blog on this topic for details) or ASE queries on MDA tables such as monDeviceIO or monIOQueue to determine if perhaps there is a slow or overloaded disk device.
Another good thing to look at in the “-c” output is the errors column. We see that there were a high percentage of kaio calls that returned an error. This output does not detail what those errors might have been, but if we ran another truss using the full output options we would see lines like:
kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF) Err#22 EINVAL
Solaris doesn’t have a man page for kaio(), but it does have one for aiowait(); which tells us that EINVAL is returned when aiowait() is called and there are no outstanding async I/Os. So, this error would appear to be normal.
3. Messages in the ASE errorlog related to system calls
In this example, the errorlog was being filled up with messages like:
00:00000:00000:2013/10/19 12:28:38.76 kernel ncheck: select, Bad file descriptor
strace was attached to the running server with the following command:
strace –f –o /tmp/trace.out –p 30375
After a few seconds, strace was stopped (using ctrl-c) and the resulting output file was analyzed. It showed many repeated messages such as:
30375 select(20, [16 17 18 19], NULL, NULL, {0, 0}) = -1 EBADF (Bad file descriptor)
This matches the error from the ASE errorlog. However, the additional information from the strace output shows the file descriptor numbers (16 17 18 and 19) being passed to the select() call. It was noted that there were only two open connections to ASE at the time this trace was done. This clearly showed that the problem was an ASE routine that was not properly clearing disconnected file descriptors. This information was passed to SAP Product Support, and resulted in a fix from engineering (this was CR 699429).
F. Conclusion
Using the Unix system call tracing utilities can aid in identifying issues related to ASE’s use of system calls. These issues may be able to be corrected on the system, or they may require a fix to ASE. Using these tools can help provide us with a better view of what is going on and let us find and fix issues; often without needing to contact SAP Product Support.