Analyzing Java Collections Usage with Memory Analyzer
In my Memory for nothing I have described how memory “wasted” for unused collections can be identified using the SAP Memory Analyzer tool. This can often help one to get some reduction in the memory footprint quickly and easily. This time I will describe some further features of the tool which can be used for obtaining a deeper insight of the collections used by a component.
Finding the empty collections as described in the previous post gives the answer to one very concrete question – “Do I keep in my objects a lot of collections with size 0?”. Of course, another interesting and more general question is “What are the sizes of the collections, which my objects are keeping?“. Well, let’s add some more: “What is the fill ratio of the collections held by my objects?“, or “Are the hash functions of my objects really working fine? Do I get a lot of collisions when I put them into HashMaps?”. These are all valid questions, whose answer could enable the developers to make better decisions when deciding how a specific task should be implemented. Unfortunately it is very difficult to answer some of them at design and development time. But this questions can be answered later – at testing/runtime, and in this blog I will describe how to get the answers based on heap dumps and with the help of the Memory Analyzer tool.
The “Collections” Group in the Query Browser
With the latest release 1.1.1 of the Memory Analyzer (download here) a Query Browser was introduced, where one can select and execute queries that inspect different aspects of the heap usage. One group of queries there – the “Collections” group – is dedicated to the analysis of the standard Java collections and arrays. Let’s see how to get the answers to the questions stated previously with the help of the queries in this group.
I will demonstrate two approaches to perform the analysis:
- Execute a query on all objects in the heap dump of a certain type (e.g. all HashMaps), and then try to find where the groups from the result belong. Example: group all HashMaps in the heap dump by their size, and later inspect what are the different objects keeping HashMaps with size 1. This approach works good if you are interested in the objects of the whole heap
- The second approach is to first narrow the group of objects to inspect, and then use the commands from the “Collections” group. Example: find the retained set of all classes matching the pattern “my.own.compoment.*“. Find in the retained set histogram the HashMap row, and execute a query to show you the size distribution. This approach is very helpful if you are trying to stay focused on a specific component within the whole application (or on a specific application within a server)
What are the sizes of the collections, which my objects are keeping?
Let’s start with analyzing the sizes of collections. What is it good for? Well, most of the times collections are used to store some data which is produced at runtime. But the decision what data structures should be used, what initial capacity to specify, etc… are taken at development time. And then the answers to these questions are often not definite. Looking at the sizes in a heap dump from a realistic scenario (be it some realistic test or a productive system) is a good way to prove if the assumptions made previously are correct. One can realize, for example, that the initial size is too small (therefore the collections are often resized at runtime) or too big – too much memory is pre-allocated unnecessarily.
Let’s look at an example. Suppose that I am a developer concerned about the memory footprint of my component, for which I have used the packages com.sap.memory.demo.*. As part of this activity I would like to perform some analysis on the collections by my coding.
The first thing I will do is calculate and display the retained set of all instances of classes in the package com.sap.memory.demo.*. By doing it, I can focus only on the instances relevant for me, and not get disturbed by millions of other objects in the heap dump:
Then, I can filter the retained set histogram to “.*HashMap.*” and execute the “Collections Grouped By Size” query from the context menu.
The result is a table, where the first column is the size of the collections, then follow the number of objects (i.e. the number of HashMaps with this size), the shallow size and the approximated retained size of the objects in this group:
Once I see an interesting group (in this example the group with size 0) I can use the “Immediate dominators” feature to figure out who is keeping the objects in this group alive. As I am only working on the retained set of my own component, I will see some classes from this component:
This is how I figured out, that out of 38.000 HashMaps kept alive by my classes 25.000 are empty. About 18.000 from the zero-sized HashMaps are kept by my InefficientDataStructure class, and the rest by two other classes of mine.
What is the fill ratio of the collections held by my objects?
The “Collection Fill Ratio” query is to some extent similar to the query we just looked at. The difference is, that it works only on collections which pre-allocate some space for storing their elements (e.g. ArrayList, HashMap, etc… ) and it provides the information how much from the pre-allocated capacity is used. The fill ratio is a number between 0 and 1, calculated by the formula “size / capacity”.
When I execute the “Collection Fill Ratio” on the HashMap instances retained by “com.sap.memory.demo.*” the result is:
Thus I can see that from about 38.000 HashMap instances there are more than 25.000 empty ones, and about 12.000 which are less than 20% full.
For better understanding one can combine both queries. For example one can call first “Collections Fill Ratio” and then “Collections Grouped By Size” on any of the groups.
Are the hash functions of my objects really working fine?
Let’s move on to the next topic – collisions in HashMaps and Hashtables. A badly implemented hash function can significantly deteriorate performance of lookups in the hash-tables. In the worst case where all elements return the same hash code, the lookup is comparable to traversing a LinkedList.
However, if the hash codes produced by some objects lead to a lot of collisions is another question which is often difficult to verify at development time. But it is also another example where the answer can be very easily found by analyzing heap dumps. Therefore, despite of the fact that this problem is more related to performance than to memory, we have provided in the Memory Analyzer a query which solves it – the “Map Collision Ratio”. The query is inspecting the single instances of HashMap (or Hashtable) and groups them by the collision ration for each of the instances. The collision ratio is the number of colliding entries from all entries inserted in the hash table.
Let’s have again a look at an example. This time I’ll have a different role – a performance expert, analyzing coding which is done by other colleagues. As such I will not focus on a specific component, but take all Hashtable instances in the heap dump and look at them:
When I execute the query I get a table where the first column shows me the collision ratio of the group, then second is the number of objects in this group, etc…
Here I can immediately see that there is 1 instance where the collision ratio is between 80 and 100%! Then I can explore this single instance, and see that all keys in it are colliding.
I can also see the concrete key objects. This info is enough for me to address the problem to my colleagues who developed the InefficientDataStructure.Key class.
An easy way to look at the content of HashMaps
Last, but not least – the “Hash Entries” query provides a very convenient way to explore the content of Map structures. This is an action I often had to perform while doing analysis.
Looking at the map entries is not always straightforward if you do it by following the references. First – keys and values appear over each other, and second – there could be some colliding entries in-between, which makes reading even harder. The image below shows such an attempt to examine a java.util.Properties object:
It isn’t very clear, is it?
By executing the “Hash Entries” query on one or several objects you can get the content extracted as a table.
This looks much better to me. Of course, it is possible to continue the analysis for any of the displayed key or value by using the context menus.
There are two more queries in the “Collections” group which provide functionality similar to the ones above (and therefore I won’t describe in details), but are working on arrays:
- “Arrays Grouped By Size” – works on both primitive and Object arrays
- “Array Fill Ratio” – does not work on primitive arrays. Provides the ratio between non-null elements in the arrays and the length of the arrays
In this blog I’ve presented the features built in Memory Analyzer which facilitate the analysis of different aspects of the collections and arrays found in a heap dump. Some of these aspects were memory footprint relevant, and some more performance relevant. They proved to be helpful for my daily work and there was already a very good feedback from some of my colleagues who used the features.
But I am sure that out there in the community there are many more good ideas about such little-helper-queries which give simple answers to what are sometimes rather complicated questions. Therefore, I would be very happy if I can get some comments and suggestions from you.