Additional Blogs by SAP
cancel
Showing results for 
Search instead for 
Did you mean: 
rudolf_meier
Explorer
0 Kudos

This is a new Weblog, so let me start with a few words of introduction on myself and how this log came about.

I joined the SAP performance group, part of Performance, Data Management & Scalability, about a year ago. To begin, I received an exciting (and sometimes frightening) initiation on performance issues, both for the ABAP and JAVA worlds. In particular, I got introduced to benchmarking situations, where typical multi-user applications are squeezed up to the very limit of the test servers. The principal behavior of service systems under load is supposed to be well understood and mathematically described by Markovian queueing theory. Nevertheless, some curves and figures for benchmarks with JAVA applications looked peculiar enough to take a closer look. In a little study we tried to shed light on details of the behavior. Here I want to share some of the findings on effects introduced by JAVA garbage collection. 

 

Expectation

 

Before we dive into JAVA details, let’s have a quick look at application behavior expected from queuing theory. Here and in the following, I will measure system performance by the average interaction response time, which is exactly what an end user would care about.

The plot shows relative response time over the system load. The solid line is the theory prediction for a so-called M/M/4 queue (a typical queuing theory acronym: it indicates that 4 service centers work on satisfying requests where both the request inter-arrival-times as well as the request service times follow an exponential Markovian or memory-less distribution. At http://www2.uwindsor.ca/~hlynka/queue.html, you’ll find a good collection of queueing theory resources on the web).

 

 

>We can immediately compare this prediction to the ABAP world. The symbols represent measurements for the FI benchmark (http://www.sap.com/benchmark😞 we see good agreement with theory. So far so good. But now let us turn to an example for a JAVA application.

>JAVA situation

>I used an example application with specific properties. I will give details on the relevant parameters (raw service time, user think time, garbage collection duration and frequency) later on. For now let’s just say that in this example the parameters are set to clearly show the behavior which popped up in the benchmark runs mentioned above.

>Already at moderate system load, the average response time is much higher than expected. In particular, we see a factor of two increase at about 55% system load. Compared to the 10% expected from theory, this cries out for an explanation. Now we have arrived at the starting point for our study: We will see that the effect is predominantly produced by JAVA garbage collection.

>

>Garbage collection (GC)

>Memory management in JAVA relies on garbage collection http://www-106.ibm.com/developerworks/java/library/j-jtp10283/ Simplified, a garbage collector run stops all application threads on the same virtual machine (VM) to remove discarded objects from heap memory.

>(To keep the discussion clear, we assume use of only one VM in the system and a simple ‘stop the world’ garbage collector. For the example measurements, we used a SUN VM http://java.sun.com/docs/hotspot with default garbage collection, so we had a generational garbage collector with minor and full GC runs. The overall heap size was 1 GB, at 57% system load we spent 2% of the run time in full garbage collection, and 4% in minor GC. The average full GC duration was 5.4 seconds.)

>It is obvious that the thread stop by a GC run has an impact on both the system utilization and the application response times. To say it for our example: during GC, only one out of the 4 processors of the test machine was used, and all active requests had to wait. Let’s see how big this stop time effect is for the response time. (Information on the impact of GC stop times on system utilization – and much more - is available here: http://java.sun.com/docs/hotspot/gc1.4.2 )

>GC Stop times

>The impact of the stop times can be calculated analytically, if we assume that single service requests are evenly distributed over time.We need only few parameters for application and system: the raw request service time, the user think time, the GC duration, and the GC frequency. I’ll leave out the calculation details and just plot the result for our example (raw request service time of 0.25 seconds and average user think time of 9.75 seconds).

>

>Indeed we see a response time increase due to the stop times. But it is rather weak and far from explaining the increase in the JAVA example.

>We have to look for more. And, actually, there is more to GC than just the stop times. For example, there is a dynamic effect which I’ll call ‘request bunching’.

>GC induced request bunching

>For the calculation of the stop time effect, I used the innocent assumption that ‘single service requests are evenly distributed over time’. One could reformulate this into ‘at any time, we find the same number of requests ready to be serviced’. Sounds boring enough, but can be drastically changed by GC, as illustrated in the following sketch. We start with a constant number of pending requests per time, corresponding to a certain moderate system load. A GC run comes along. During GC, all requests wait, and new requests are simply added to the waiting list. So, when GC is finished, we end up with a much larger number of pending request – which corresponds to a much higher system load.

>Finally, this request bump is worked down. How we get back to ‘normal’? Now, that depends on application details. It is even possible to never get back but end up in eternal load oscillations – nice topic for future logs.

>

>In addition to the pure GC stop time we now have a high load phase and a recovery period which both affect response times. To get numbers, I have modeled the JAVA example in a Monte Carlo simulation. The result is shown on the last plot. The simulation result (corrected by a measured service time increase) is in good agreement with the data. So indeed the garbage collector seems to be responsible for this behavior of our example application!

>What’s next?

>We have seen that JAVA garbage collection can have significant impact on response times. We think we understand that this is caused by the combined effect of stop times and request bunching. But so far that is just an interesting observation. It is time to look at consequences.

- style='font:7.0pt "Times New Roman"'>          >What does this mean for Java development?

- style='font:7.0pt "Times New Roman"'>          >What about optimizing the virtual machine, heap size, garbage collection?

- style='font:7.0pt "Times New Roman"'>          >Nice example. But what is the effect for my application? Is this relevant at all?

- style='font:7.0pt "Times New Roman"'>          >What about other serializations/contentions?

>With your help, I’ll address these questions in the future. Certainly this is not the first time anyone stumbled over these issues, so I’m looking forward to learning about your experience!

>Also, there will be more on the analytic calculations, the Monte Carlo simulation, and other topics which got short-changed. And I promise to keep it briefer next time!

10 Comments