Analyzing performance problems on a production system
class=”Section1″>h2. Analyzing
performance problems on a production system or how
to profile without a profiler
Unfortunately
sometimes performance
problems do not show up until your system is in production, even if you
try all
your best to avoid this situation. When it happens your options are
often
limited. You cannot just install a profiler on the server, because that
usually
slows down your system so much it would become unusable. The situation
can get
even worse, when the problem only occurs infrequently.
So
what can you do today (I can promise you
that the SAP VM will improve the situation pretty soon) to figure out,
which
code causes your problems?
You can use thread dumps.
h3. Automatically
getting thread dumps
class=”MsoNormal”>As we
learned in this The amazing new heap dump feature in JDK 1.4.2_12,
there’s a way
to trigger thread dumps from the MMC. There’s another way if
you want to
automate this and get more thread dumps. On Windows you can use the sapntkill command to send the QUIT signal to your jlaunch process.
“sapntkill -QUIT
Analyzing thread dumps
class=”MsoNormal”>Essentially
you can do a simple, not very
accurate, but still surprisingly helpful profile of your application.
A
simple way to analyze all the data you
got with the thread dumps is to use simple standard Unix tools like
“grep” and “sort”, to get a
condensed overview of what is happening on your application server.
If you say “ but I`m on Windows”. Ok
that’s your fault ;-).No just goto http://www.cygwin.com/
and install the cygwin
tools. For the following examples you need only
“grep”, “uniq”, and
“sort”.
Try
the following command :
grep
“s*at ” std_server0.out | sort
-k2 -r | uniq -c |sort -k1 -n -r | more
This
will output something like this
3048 at java.lang.Object.wait(Native Method)
2526 at java.lang.Object.wait(Object.java:429)
1255 at java.lang.Thread.run(Thread.java:534)
975 at com.sapportals.wcm.util.events.EventSenderThread.run(EventSenderThread.java:75)
975 at com.sapportals.wcm.util.events.EventQueue.dequeue(EventQueue.java:68)
533 at com.sap.engine.lib.util.WaitQueue.dequeue(WaitQueue.java:238)
530 at java.security.AccessController.doPrivileged(Native Method)
516 at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
493 at EDU.oswego.cs.dl.util.concurrent.SynchronousChannel.take(SynchronousChannel.java:209)
493 at EDU.oswego.cs.dl.util.concurrent.PooledExecutor.getTask(PooledExecutor.java:707)
493 at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java:727)
485 at com.sap.engine.core.thread.impl5.SingleThread.run(SingleThread.java:127)
410 at java.lang.Thread.sleep(Native Method)
405 at com.sapportals.portal.pcd.gl.PcdProxyContext.basicContextLookup(PcdProxyContext.java:1101)
What
you can see is the source code lines
that appeared most often in your thread dumps. You can interpret the
number in
the first row as an indicator for the elapsed time that the
code in the second column spend in the actual source code
line.
Of
course some of these entries are not
interesting because the code is just waiting for something. But with
some experience
you can quickly figure out where the problem is. In this case, the last
line indicates that a lot of time is spend in the pcd. You can then go
back to your std_server file and check from where this code was
called.
Instead
of using Unix tools you can also
write more sophisticated scripts in perl (if you want that no one else
can read
it 😉 ), in python or ruby or any other programming language.
By
the way, another nice tool to visualize thread dumps
is Samurai.
In
the next blog I will show you what other type of performance problems,
can be analyzed by using thread dumps.
class=”MsoNormal”>Happy
thread dumping,
Markus
Hi Markus,
Thanks for the feedback 🙂
You need to look for std_server0.out (or dev_server0) in C:\usr\sap\J2E\JC00\work ( I guess from your path above).
Regards,
Markus