Skip to Content

One important factor when trying to make your application more scalable is to reduce or minimize the memory consumption of your application. From what I see during my daily work, one of the most heavily used classes is java.lang.String. Looking at the typical distribution of a how much memory is consumed in a typical scenario on the SAP Web AS, I usually see around 30-40% spend for Strings only. Also the String class looks simple on the surface, the implementation is really rather complex, which makes it easy to make mistakes.

Lets have a look at the JDK 1.4.2 source for String :

public final class String {//8 bytes overhead for the object private char value[]; //16 bytes minimum + 4 for the reference to the array private int offset; //4 bytes private int count; //4 bytes private int hash = 0; //4 bytes

I added the overhead information for the 32bit Intel SUN JDK. As you can see an object typically has 8 bytes overhead and an int costs 4 Bytes. A char[] has an overhead of 4 Bytes for storing it’s size plus 8 Bytes like each object and needs to by aligned to 8 byte boundaries. Therefore it needs at least 16 bytes.

This means an empty String needs 40 Bytes. So a String object is not that lite.

The char[] might be shared between two or more instances of String. This will reduce the overhead for a String that shares and existing char[] to 24 Bytes. The char[] is only shared when certain functions are called, which can lead to surprising results. Consider the following (simplified, but real world) code fragment :

StringBuffer buffer = new StringBuffer(256); buffer.append("bla"); buffer.append("bla"); String result=buffer.toString(); return result;

Pretty basic (and useless ;)) stuff, you may think. Using a StringBuffer (or StringBuilder on JDK 1.5) is the recommended way to concatenate Strings. + is evil, because it creates lot’s of temporary objects.

But the surprise comes, when you count how much memory the returned string will take. The answer is:

24+2*256+12=548 Bytes

Since the original buffer is not used anymore this is huge waste of memory. If you look at the JDK, you will find that buffer.toString() does not create a new char[], but just very efficiently reuses the char[] from the StringBuffer.

Unfortunately the StringBuffer was much too big for the resulting String.

So please keep the following rule under your pillow :

Try to size your buffer correctly according to the size of the result

In this case you would just use StringBuffer(6).

Interesting enough the following code would also not waste any memory, also it’s slightly less efficient :

StringBuffer buffer = new StringBuffer(256); buffer.append("bla"); buffer.append("bla"); String result=new String(buffer); return new String(result);

The key here is that the constructor new String(result) will automatically create a new char[] if the existing one is to large. This way the JDK tries to ensure that Strings don’t get bigger all the time.

Regards,
Markus

To report this post you need to login first.

8 Comments

You must be Logged on to comment or reply to a post.

  1. Anonymous
    A nicely written blog with details which many developers usually neglect initially but sure to come back to them later as performance issues.

    The same holds for all collection classes wherein using the constructor which takes in a explicit size might be more useful than using the default constructor as in both the cases,

    1) if the resultant collection is going to be small, then unnecessary array space is allocated.

    2) if the resultant collection is going to be very big, then unnecessary array copy operation will be required.

    (0) 
    1. Anonymous
      I seriously doubt if any application developer really goes down to the extent of minimizing where and how strings are used.

      There were even some theories floating few years back to use synchronized on the buffer to minimize the no of locks. example

      synchronized(stringBuffer){
        stringBuffer.append(“bla”)
        stringBuffer.append(“bla”)
      }

      But consciously using this in real life might happen only during job interviews :)-

      (0) 
      1. Markus Kohler Post author
        Sure,
        I agree that many developers don’t minimize the use of Strings. IMHO this is a big mistake.
        Strings in Java are much more complicated ond more bulky than a char[] in a language like C.

        As a matter of fact if 40% of your memory is occupied by Strings, it’s pretty clear that you should start our use of Strings, because that is likely to get the biggest benefit.

        I you would like to have more information, feel free to drop me an e-mail.

        Regards,
        Markus

        (0) 
  2. Carsten Saager
    With JDK 6 the String(Builder|Buffer).toString returns new String(value,0,count) so the String is optimal sized. The JDK6 Strings have always optimal size – there is still some dead code in b103 in String(String) to copy the internal array if it has excess-bytes, perhaps it has an effect when cloning a static final compiled with an earlier version of the JDK (this is what gets used in the second part). I didn’t check if this optimization had been backported to JDK5 already.

    A real caveat lies though in the automatic sizing of a Buffer/Builder: The buffer doubles each time its size so you can end out with an Object that has nearly twice the size need (in JDK6). Excplicitly calling ensureCapacity avoids this.

    I cannot really believe that your JDK really reuses the char[] of the buffer for the String creation. As the String is required to be immutable it has to make a copy unless the buffer is implementing a copy on write to ensure that – if this was the implementation in JDK5 I wonder why they dropped this.

    (0) 
    1. Markus Kohler Post author
      Hi,
      Yes you are correct. The behaviour of several String funtions changed in JDK 5. But still,those who are working on NW04(s) are still on JDK 1.4.2. That’s why I said in the beginning “JDK 1.4.2”.

      I agree that sizing is also important because of the automatic doubling (works the same way in all JDK versions), which also generates temporary objects.

      And Yes the StringBuffer shares the char[]. That is the reason that constructing Strings with StringBuffer is so efficient. StringBuffer.toString() does *not* have to copy the string.

      StringBuffer has a flag that indicates whether the char[] is shared. As soon as you call something that changes the StringBuffer the char[] will be copied.

      Regards,
      Markus

      (0) 
  3. Gernot Glawe
    I think if you calculate the size of an string object, you have to add the size of an empty object, i.e new Object().

    And what about

    StringBuffer buffer = new StringBuffer(256);
    buffer.append(“bla”);
    buffer.append(“bla”);
    return buffer.toString();

    That’s what i usually do.

    (0) 
    1. Markus Kohler Post author
      Hi,
      The overhead of an object is as I said 8 bytes on a 32 bit SUN VM.

      Regarding your code example. Yes you will end up (on JDK 1.4) with a small String referencing the large (256 entries) char[].

      Regards,
      Markus

      (0) 

Leave a Reply