A few days ago David Griffiths posted a blog in which he expressed his feeling that we copied “his FindRoots Dominator Tree approach” into the SAP Memory Analyzer. That’s not the case. It took us a while to identify dominators as helpful graph-theoretical approach and we learned that the hard way by ourselves. And if I may say so, we are a bit proud of this idea, as well as he most likely is. ðŸ˜‰

Dominators have great properties and are for me a very big step forward in static memory analysis. But please allow me to give you some background on how we found out abut them.

We had put much thought in how to efficiently parse huge heap dumps with tens or hundreds of millions of objects, but we realized that we need more helpful numbers than the plain size of the objects. Strings e.g. are always 24 bytes large (in the HotSpot JVM for a 32 bit box), but you don’t know how much memory they really “need”, because the real data is stored in the char[] that’s referenced by the String object. So we looked into an algorithm which can give us that information. First we computed that size with GC simulations. For that we cut off all references to the object and then simulated a GC and checked what’s no longer reachable. But computing for all objects this GC or retained size has a time complexity of O(mn) with m being the number of object references (which will be traversed in the GC simulation) and n being the number of objects (for which you want to compute the retained size). So we looked out for any graph-theoretical approach and stumbled over the dominators. An object x dominates another object y if all paths from the roots R to y run through x. If you read this carefully you will see that this is exactly what we were looking for. The retained size for an object y is the sum of the plain object sizes of the dominated objects X. Now we looked for an algorithm. Dominators are used in compilers for code flow analysis and there the number of vertices and edges is pretty small compared to heap dumps, especially ultra-large ones. Early algorithms like that by Purdom and Moore offered a time complexity of O(mn). More sophisticated algorithms run with O(m log n). Cooper, Harvey and Kennedy reported in their paper “A Simple, Fast Dominance Algorithm” that their algorithm with a time complexity still of O(mn) is faster in practice because a smaller constant algorithmic factor gets more important than the time complexity alone for small flow-graphs. Georgiadis, Werneck, Tarjan, Triantafyllis and August have made a very good comparison in their paper “Finding Dominators in Practice”. However, for heap dumps with millions or billions of edges (compared to maybe thousands in flow-graphs) we needed a time complexity close to O(m). We found this in the paper “A Fast Algorithm for Finding Dominators in a Flowgraph” by Lengauer and Tarjan. Their algorithm has a time complexity of O(ma(m, n)) where a(m, n) is a functional inverse of the Ackermann’s function. Now we competed internally a bit and tried to implement that algorithm so that it runs as fast as possible and uses as less memory as possible.

That was the brief history on why we went into the direction of dominators. It was because we wanted to replace the GC simulation by something much faster to get to the retained size for single objects.

Fortunately, the dominators have not just made the retained size computation faster, but instant. Once the dominators are computed during initial parsing of the heap dump we know for each and every object how big in terms of memory it really is. This instant retained size per single object was what the dominators gave us first.

It was only then that we started to realize how powerful dominators are. Based on the graph theory behind them we started to visualize the dominators in a tree and were happy to see that the tree shows the biggest distinct objects. This was the second great advantage we took out of the dominators and the feature David Griffiths referred to in his blog. However, we haven’t copied this idea – we had it ourselves.

We continued to analyze heap dumps and have seen quite some of them where the dominator tree wasn’t helping by itself. But at this time we have already understood that aggregation patterns may be uncovered by grouping. So we started to introduce grouping features into the dominator tree. We grouped by classes to spot the big chunks of memory and this helped again. Often many small objects of the same class form a big chunk of memory – the small objects by themselves could not be detected but the group of them could. We continued and saw that class loaders are another great chance for grouping. That was the third advantage we took out of the dominators.

Aggregation pattern were key and we often wanted to find out for a single object or a set of objects who keeps them alive. Following the inbound references doesn’t help, but following the dominators does. We introduced an exclude filter to filter out uninteresting information, e.g. if you check the dominator of a char[] or all char[]s you will often find java.lang.String but this information is of no help to the user of the tool. If the user has specified the exclude pattern java.* we go up the dominators until we reach a class from a package not matching the exclude pattern. Thereby we ignore also collections which are often seen or whatever other data structures until we reach a package of importance. That was the fourth feature based on dominators.

Dominators helped us to solve the problem of finding the retained size for a single object, but for sets of objects we still had to run a GC simulation. In fact, this is what we still offer as precise retained size. But we found out that we can use the dominators even for object sets to compute a lower bound for the retained size for a set of objects absurdly fast – the bigger a heap dump the better. Instead of simulating a GC with O(m) where m is the number of object references we run with O(n’) with n’ is the number of objects in the set. This allows even for small object sets to compute what we call minimum retained size. The algorithm does no longer depend on the heap dump size, but only on the number of objects in the set – this opens up whole new possibilities as now retained size became almost as cheap as the plain object size, but the retained size is so much more helpful. That was the fifth feature we implemented based on the dominators.

The subject of static memory analysis (or post-mortem memory analysis) is a very exciting one. We are constantly trying to better understand memory aggregation patterns and how to uncover them. Algorithmic challenges as the one for the “real” size of objects make our lives, the lives of computer science people so interesting and creative.

Having said that and with all due respect to David Griffiths, good ideas can jump out of the heads of many people. The SAP Memory Analyzer is a true development by its team which has not copied from any other tool. Please go and try it out, it offers a performance and feature richness which will hopefully help to move the whole topic of memory analysis one step forward.

David GriffithsWhen I saw that dominator tree concept I thought it sounded interesting in the OOM context for obvious reasons (ie you’re only interested in what is keeping things alive, ie what is the dominator). When I tried it out in FindRoots it suddenly collapsed the huge complex trees and gave a much clearer view.

You’ve obviously taken the idea of dominators a lot further than me anyway, I must download your tool when I have time and have a play with it.

BTW it’s an interesting field. The thing that gave me the most trouble was calculating the so-called transitive closure. I think I have a solution I’m happy enough with now in terms of speed and space, but it’s really hard to do it efficiently. Interesting to see that you have side-stepped the problem via the dominator approach. My FindRoots work started life as an attempt to find the “roots” from a graph of just the Java heap, ie without the knowledge of the actual GC roots. So the idea was to find the objects with the biggest reachability and treat them as the roots. Obviously you can’t use the dominator approach because you have a chicken and egg situation of not knowing where to start!

Vedran LerencPost authorHowever, Niklaus Wirth once said, “Software gets faster slower than hardware gets faster”. I liked that.

About the 2 + 2 = 5. Well, the only thing we say with this feature is that 2 + 2 >= 4. If you get this lower bound a thousand times faster it might be more helpful than a precise answer. That may not be accurate, but pragmatic. And you don’t care much if your lower bound already rings the alarm bell. But yes, if the lower bound is under your radar you miss it, but in reality we haven’t had this problem and you may chose the retained size you like (precise or just the lower bound). Would be interesting to do some more research and learn why the lower bound matches the precise retained size so well.

In general I believe we should try to better understand the problem domain. What I would really like to see is run a heap dump through some kind of correlation engine. I wonder if this could turn out good findings – it could better explain how data is hold.

Anyway, thanks for your reply.