System debugging and analysis techniques for ASE on Unix/Linux
I will be posting a series of blogs, about one per week, that will outline how to use diagnostic tools outside of the ones internal to ASE. The intent is to help provide additional resources and expertise to help in troubleshooting various kinds of errors and support issues.
BLOG #1 – General introduction
- Using tools outside of the ones provided by SAP ASE
Monitoring tools are available from within ASE as well as at the system level to help diagnose problems. These blogs and associated scripts will primarily cover the external, system-level tools for Unix and Linux platforms.
The tool(s) used will be dependent upon the type of problem being encountered. For each of the tools the types of problems it may be useful for will be covered. The three main categories of problems considered are:
- High Resource Usage
- Resource Saturation
B. General comments on the problem categories
- High Resource Usage
General Rule: Slow performance is always an issue of some resource limit creating a bottleneck. It may be CPU cycles, disk, memory, network, or a combination of these. It can be caused by an overuse of these resources (such as cpu usage being high due to a poor query plan) or an OS or system issue limiting access to the resource (such as system memory availability being low causing memory paging). The first step is to determine what resource is causing the issue, and then proceed to determine why that resource is causing a problem.
Errors, as indicated by error messages, generally occur for one of two reasons. Either the program has a bug and is mishandling some condition or the operating system/hardware is raising the error. The first step here is to try to determine which of those two is more likely. This document will only cover the second condition, but it may be that the use of the tools listed here may help determine which of the two areas should be zeroed in on.
3. Resource saturation
When a resource reaches its saturation point typically some sort of queuing will then take place. As an example; when a disk drive becomes saturated the wait queue for that disk device will start increasing. Note that this saturation may take place in short periods of time – the disk may be overall only 70% busy but still have periods where the wait queue is large. Resource saturation can also show up as hangs; where the queue for a resource is large enough that little or no real work is able to be accomplished.
C. Categories of tools
There are six types of tools that will be covered. I’ll be putting out a separate blog for each one. For each there will be scripts showing possible uses, and a discussion of interpreting the outputs from each tool. Note, though, that good interpretation takes practice; do not expect the provided scripts to be able to give all the answers. While nearly all systems will have some version of these tools, the tools themselves and the outputs available may vary greatly from platform to platform (or even between various levels of operating system on the same platform). The six types are:
- System performance monitoring such as vmstat/sar/iostat
- Stack tracing such as pstack/procstack
- System call tracing such as strace/truss
- System log analysis such as errpt/syslogs
- Memory usage such as pmap/procmap
- Program profiling such as oprofile/trace/dtrace