Reducing the memory consumption of your Java application (Part I)
One important factor when trying to make your application more scalable is to reduce or minimize the memory consumption of your application. From what I see during my daily work, one of the most heavily used classes is java.lang.String. Looking at the typical distribution of a how much memory is consumed in a typical scenario on the SAP Web AS, I usually see around 30-40% spend for Strings only. Also the String class looks simple on the surface, the implementation is really rather complex, which makes it easy to make mistakes.
Lets have a look at the JDK 1.4.2 source for String :
public final class String {//8 bytes overhead for the object private char value[]; //16 bytes minimum + 4 for the reference to the array private int offset; //4 bytes private int count; //4 bytes private int hash = 0; //4 bytes
I added the overhead information for the 32bit Intel SUN JDK. As you can see an object typically has 8 bytes overhead and an int costs 4 Bytes. A char[] has an overhead of 4 Bytes for storing its size plus 8 Bytes like each object and needs to by aligned to 8 byte boundaries. Therefore it needs at least 16 bytes.
This means an empty String needs 40 Bytes. So a String object is not that lite.
The char[] might be shared between two or more instances of String. This will reduce the overhead for a String that shares and existing char[] to 24 Bytes. The char[] is only shared when certain functions are called, which can lead to surprising results. Consider the following (simplified, but real world) code fragment :
StringBuffer buffer = new StringBuffer(256); buffer.append("bla"); buffer.append("bla"); String result=buffer.toString(); return result;
Pretty basic (and useless ;)) stuff, you may think. Using a StringBuffer (or StringBuilder on JDK 1.5) is the recommended way to concatenate Strings. + is evil, because it creates lots of temporary objects.
But the surprise comes, when you count how much memory the returned string will take. The answer is:
24+2*256+12=548 Bytes
Since the original buffer is not used anymore this is huge waste of memory. If you look at the JDK, you will find that buffer.toString() does not create a new char[], but just very efficiently reuses the char[] from the StringBuffer.
Unfortunately the StringBuffer was much too big for the resulting String.
So please keep the following rule under your pillow :
Try to size your buffer correctly according to the size of the result
In this case you would just use StringBuffer(6).
Interesting enough the following code would also not waste any memory, also its slightly less efficient :
StringBuffer buffer = new StringBuffer(256); buffer.append("bla"); buffer.append("bla"); String result=new String(buffer); return new String(result);
The key here is that the constructor new String(result) will automatically create a new char[] if the existing one is to large. This way the JDK tries to ensure that Strings dont get bigger all the time.
Regards,
Markus
The same holds for all collection classes wherein using the constructor which takes in a explicit size might be more useful than using the default constructor as in both the cases,
1) if the resultant collection is going to be small, then unnecessary array space is allocated.
2) if the resultant collection is going to be very big, then unnecessary array copy operation will be required.
There were even some theories floating few years back to use synchronized on the buffer to minimize the no of locks. example
synchronized(stringBuffer){
stringBuffer.append("bla")
stringBuffer.append("bla")
}
But consciously using this in real life might happen only during job interviews :)-
I agree that many developers don't minimize the use of Strings. IMHO this is a big mistake.
Strings in Java are much more complicated ond more bulky than a char[] in a language like C.
As a matter of fact if 40% of your memory is occupied by Strings, it's pretty clear that you should start our use of Strings, because that is likely to get the biggest benefit.
I you would like to have more information, feel free to drop me an e-mail.
Regards,
Markus
I meant
"it's pretty clear that you should start to take a look at your use of Strings"
A real caveat lies though in the automatic sizing of a Buffer/Builder: The buffer doubles each time its size so you can end out with an Object that has nearly twice the size need (in JDK6). Excplicitly calling ensureCapacity avoids this.
I cannot really believe that your JDK really reuses the char[] of the buffer for the String creation. As the String is required to be immutable it has to make a copy unless the buffer is implementing a copy on write to ensure that - if this was the implementation in JDK5 I wonder why they dropped this.
Yes you are correct. The behaviour of several String funtions changed in JDK 5. But still,those who are working on NW04(s) are still on JDK 1.4.2. That's why I said in the beginning "JDK 1.4.2".
I agree that sizing is also important because of the automatic doubling (works the same way in all JDK versions), which also generates temporary objects.
And Yes the StringBuffer shares the char[]. That is the reason that constructing Strings with StringBuffer is so efficient. StringBuffer.toString() does *not* have to copy the string.
StringBuffer has a flag that indicates whether the char[] is shared. As soon as you call something that changes the StringBuffer the char[] will be copied.
Regards,
Markus
And what about
StringBuffer buffer = new StringBuffer(256);
buffer.append("bla");
buffer.append("bla");
return buffer.toString();
That's what i usually do.
The overhead of an object is as I said 8 bytes on a 32 bit SUN VM.
Regarding your code example. Yes you will end up (on JDK 1.4) with a small String referencing the large (256 entries) char[].
Regards,
Markus