Finding Memory Leaks with SAP Memory Analyzer
There is a common understanding that a single snapshot of the java heap is not enough for finding a memory leak. The usual approach is to search for a monotonous increase of the number of objects of some class by online profiling/monitoring or by comparing a series of snapshots made over time. However, such a live monitoring is not always possible, and is especially difficult to be performed in productive systems because of the performance costs of using a profiler, and because of the fact that some leaks show themselves only rarely, when certain conditions have appeared.
In this blog I will try to give some guidelines how one could find the unwanted memory accumulation without having to sleep besides the servers in the office – with the help of the recently released SAP Memory Analyzer tool, and a couple of tricks .
First make sure that you will get sufficient data for the troubleshooting even if the problem occurs when you are not on the system. For this purpose configure the JVM to produce a heap dump when an OutOfMemoryError occurs (see description here).
The second step of the preparation is to enable the memory leak to become more visible and easily detectable. To achieve this use the following trick configure the maximum size of the java heap to be much higher (say it twice) than the heap used when the application is running correctly (e.g. set it to be twice as much as what is usually left after a full GC). Even if you dont know how much memory the application really needs, increasing the heap is not a bad idea (it may turn out that there is no leak but simply more heap is required). I dont want to go into discussions if running Java applications with too big heaps is a good approach in general – simply use the tip for the time of the troubleshooting.
What do you gain by this change? If the VM throws an OutOfMemoryError with this configuration it will produce a heap dump in which the size of the objects related to the leak will be about the half of the total heap size, i.e. it should be relatively easy to detect the leak later.
Analysis – Case 1
Now imagine that after the latter configurations are activated, you go to the office in the morning and find that the error has reoccurred and there is a nice big heap dump in the file system. What is next? Well, believe it or not, what follows is the easier part.
First, open the heap dump with the SAP Memory Analyzer tool. One may have to wait a bit for the initial parsing if the heap dump is too big, but subsequent reopening will be instant (see some performance metrics here).
Then lets search who has eaten up the memory. Go to the Dominator tree view.
There you will find the object graph transformed in a tree a special kind of tree showing the objects dependencies, and not simply the references between them. I wont go into details about the theory behind this tree, but Ill simply list some of its key properties:
– On the top of these tree (i.e. what you see immediately after opening it) one can find the biggest objects in the heap
– All descendants of an object in the dominator tree are being retained by it (meaning that they will be garbage collected if the object is garbage collected). The biggest objects are the ones which retained most heap
In most of the cases when there is a leak one will immediately notice it by looking at the size of the biggest object. To go a bit closer to the real accumulation point one should expand the tree under the biggest object until a significant drop in the retained sizes of the parent and the children is seen (usually this will be some kind of a collection or an array). Well, its so easy. You found it! If you are interested you can also analyze the content by further exploring the dominator tree.
The next thing to do is to see the real reference chain from the GC roots. Simply call Paths from the GC roots from the context menu on the accumulation point object.
In the “Paths from the GC Roots” view one can see the references with the names of the fields.
Analysis – Case 2
It would be nice if every problem was so easily found. Sometimes however the first look at the dominator tree is not enough. But one more click should make the second look sufficient. One click, but where? On the Group by class button from the toolbar. Here is some explanation. Previously we have configured big enough heap for the leak to grow. And we also have a dominator tree covering the full object graph that also includes the leak. So, why dont we see it? In the example we just looked at all the small leaking objects were dominated by one single object whose retained size was huge. But sometimes it may happen that the leaking objects themselves are on the top of the dominator tree. Even though they are many in number, each of them is small in size and is therefore not displayed among the biggest objects.
However, if we manage to find the whole group of leaking objects and see their aggregated size, then the leak will be as easily noticed as in the previous example. This is namely achieved by grouping the objects by their class.
So, did you find the memory eater now? I hope the answer is “Yes”. If not, please let me have the heap dump you are looking at and Ill try to extend and complete the description.
Hi,<br/><br/>this guide is exactly what I was search for.<br/>Unfortunately I can't get a heap dump file.<br/>I have posted a question on that:<br/><Can't find Java Heap Dump hprof file><br/><br/>Do you have an idea?<br/><br/>Regards<br/>Ingo
Thanks for the nice words about the guide.
As for the problem with getting heap dump - it was resolved (one can look at the forum in case of similar problems).
I would be happy to get some feedback if the explanation in the blog helped you with your real heap dumps.
I m working on Netweaver 7.1 CE , I downloaded plugins for Mamory Analyzer and copied to my plugins folder. I can view perspective and views for memory analyzer. I took heap dump from server , n tried to open that file in Memory Analyzer. but m not able able to open it says
" An internal error occurred during: "parsing D:\HPROF\java_pid26424_4.hprof".
what can be issue in this case? I tried to taje heap dump 2-3 times but same thing happened
If I properly understand, you are trying to add the Memory Analyzer to your existing IDE (be it just Eclipse or NWDS). Is this so?
Have you used the update site at Eclipse?
http://www.eclipse.org/mat/downloads.php here there is a description for the installation of both standalone Memory Analyzer and the plugins.
What can help is to send me the log file ( \workspace\.metadata\.log ). My e-mail is krum dot tsvetkov at sap dot com.
Finally i cluld resolve problem. I was copying hprof file from remote application server to my local machine in asc mode. i did ftp in bin mode n it worked.Thnx.
One pronlem which happens quite often and is not necessary a memory leak is that a certain operation (request) needs too much memory to be completed, e.g. during the processing of the request an attempt is made to load a whole DB table in memory, or to load the whole content of a huge file, etc...I hope this explanation can help you to continue your troubleshooting.
Really useful.. I have a small question..
how can we conclude that there is a memory leak by just looking merely at the size of the object(from dominator tree)? It might be a live object holding huge amount of data... and it might release them completely once it dies...
Lovey blog to overcome the out of memoryissue dump, Once we identify the suspect how do we proceed? I have identify the dump using your technique but not sure how to go from here.
I would really appreciate your inputs on this.
Once you identify where the biggest objects are (the suspects) you should usually have a look at the paths from the GC roots to the accumulation point. The path is the chain of references and (depending on the dump format) you should see all instances and the names of the fields through which they reference each other.
In order to decide if this paths are expected or not you do need some specific knowledge about the components.
The other thing is to look what kind of objects have been accumulated under the suspect. This also helps sometimes to get a better understanding about the root cause.
I know this is not very precise step description, but the really detailed analysis is a bit different from case to case. I hope this hels anyway.
Really appreciaed your reply on the issue, I Have caputed the screens as suggested by you can i send it to you to have your opinion thanks for your valuable time.
I work in support an we use the SAP MAT mainly in case of memory leaks suspects and OoM on customer systems. Mostly, we generate the leak suspect report and hand over the zip file to development support, because eliminating the root cause requires indepth knowledge of the source code. For this purpose the tool has been designed really well.
For analyzing an OoM or suspect for a memory leak, I feel it would be very useful to have a statistics of how long the object have been in memory at the point in time when creating the heap dump. Memory leak suspects would be very old typically, and OoM suspects would be very young typically. It seems like there is no way to implement a timestamp analysis at present.