Skip to Content

Check also my new blog at http://kohlerm.blogspot.com/ 

This is the second part of my series about how to reduce the memory consumption of your Java application. Part I can be found Reducing the memory consumption of your Java application (Part I).

Some comments about my last blog. I’m speaking about JDK 1.4.2 here, because this is what is used today by Netweaver04(s). The rules for JDK 1.5,or JDK 1.6 are different, the general rule being, that there’s less sharing of char[]’s in the newer JDK’s. Finding out the exact rules for the newer JDK’s is left as an exercise for the reader 😉

Another important function in String is substring(). If you call substring on an existing object this will result in a new String object being created (Strings are immutable), which shares the char[] of the existing String.

This is in general a good thing, because it allows you to save some memory. If you have for example a path of a file name like “/netweaver/is/great.txt”, then you can have one String object for the whole path and construct another String object for the filename “great.txt” that shares the existing char[].

But, you can also easily shoot yourself into the feet :

String fileName="a picture with pretty long file name.jpg";
int length=fileName.length();
String fileType=fileName.substring(length-3,length);
return  fileType;

The intent of this “beautiful” code is to get the type of the file. The problem with this code is that the original String referenced by “fileName” will not be referenced anymore after the method returns the file type. You will end up with a String being returned with the char[] “a picture with pretty long file name.jpg”, which is 41 characters versus the 3 characters you really need for “jpg”.

There are even people out there that think, that this is a bug, but as I tried you to explain this is really a feature that can help you to reduce memory consumption.

UPDATE: as Frank pointed out (see comments below) this is even more confusing, than I thought in the first place. There are 2 similiar constructors in String :
public String(char value[], int offset, int count)
and
String(int offset, int count, char value[])
The later is called by substring.

Regards,
Markus

To report this post you need to login first.

11 Comments

You must be Logged on to comment or reply to a post.

  1. Frank Nestel
    I realized I still have a copy of JDK 1.4.2_09 on my disk (the oldest one, since we’re mostly on 1.5 now) and in the source code of java.lang.String I see

    In substring(int beginIndex, int endIndex):

    return ((beginIndex == 0) && (endIndex == count)) ? this : new String(offset + beginIndex, endIndex – beginIndex, value);

    and in public String(char value[], int offset, int count)

    this.value = new char[count];
    this.count = count;
    System.arraycopy(value, offset, this.value, 0, count);

    The quoted bugparade bug is allso against Java 1.4, it is from 2001, when Java was nearly half as mature as today.

    (0) 
    1. Markus Kohler Post author
      Hi,
      I was talking about JDK 1.4.2_12 which is almost the latest. _13 just came out recently and I haven’t tested it.

      I choose the first bug that I found in Sun’s database. I think there are several bugs about this feature( issue).

      My String constructor is :
        String(int offset, int count, char value[]) {
           this.value = value;
           this.offset = offset;
           this.count = count;
          }

      Yours must be from JDK 1.5,I guess.

      Regards,
      Markus

      (0) 
      1. Frank Nestel
        Hi Markus,
        as mentioned, the source I quoted is 1.4.2_09. I didn’t mention it was SUN JDK. I do not care for latest 1.4 JDKs right now, but to be sure I installed SUN JDK 1.4.2_13 and it contains the same source as I quoted above, really wonder where you got yours from. I’m almost sure your hint was true in the past. Though I considered it kind of well known then. I keep on forgetting about the StringBuffer thing from part I though.
        Best,
        Frank
        (0) 
        1. Markus Kohler Post author
          Hi Frank,
          I’ve think I got it now. Also I was pretty sure that I was correct, because I had run test code to verify the behaviour, when I looked at the sources directly (without Eclipse) I thought for a second that you were right. “Fortunately” when going back to Eclipse I found that everything I said is correct.

          Note that there is
          public String(char value[], int offset, int count) {

          and

          String(int offset, int count, char value[]) {

          The later is called by substring.
          This makes all this even more confusing …

          Thanks for the hint, I wasn’t aware of this until now.

          Regards,
          Markus

          (0) 
          1. Frank Nestel
            Thanks Markus,

            actually I was also confused about both constructors. The fun point is, that I thought this was a well known old recommendation, which had become obsolete long time ago. But source for Java 5 and 6 still looks the same! So it persists.

            However the implementation of substring is both speed and memory efficient if the original string is long lived, for example “final static” or, as another example, you need to hold three strings: A filename, the “pure” name and the extension for the same time, then all three share the characters.

            But if you only care about the substring and do not need the original string any more, then you should use an idiom like “new String(x.substring(..))”. Actually the source comments here also suggest this particular usage of “public String(String s)” constructor.

            Best
            Frank

            (0) 
  2. Valery Silaev
    Markus,

    First of all, your code is incorrect. File extension is not 3 latest characters but rather all characters after last dot.

    The correct code is:
    final String fileName =
      “a picture with pretty long file name.jpg”;
    final int lastDotIdx = fileName.lastIndexOf(‘.’);
    if (lastDotIdx > 0)
      return  fileName.substring(lastDotIdx + 1);
    else
      return “”;

    So your “optimized” code is broken. Exactly as I (and many others starting from Knuth) told.

    Second, after short study you confirmed yourself that this is overall not an issue — different constructor is used and there is no hard back-reference to large string. Again, I warn namely about this — do not optimize before problem arise. Your beforehand optimization based on false assumptions.

    VS

    (0) 
    1. Markus Kohler Post author
      Hi,
      No the code is not broken. 
      The filename was verified before, to end with .gif or .jpg.. I just didn’t post the code 🙂

      Anyway, the point of this example was not to show the most correct or most efficient way to get a file extension. The point was to have a simple example that shows how substring works.

      I didn’t confirm, that the large char[] is not returned. In fact it is returned. You may easily check that by running the example in your IDE of choice.

      In fact I think, that when even an experienced developer like you doesn’t seem to believe that it works this way ( your are not the first one, by the way), then that is for sure an indicator that others wouldn’t even imagine, that it could be implemented this way.
      Regards,
      Markus

      (0) 
      1. Valery Silaev
        Markus,

        Code has problems:
        .version — this is file name without extension;
        my_data — this is file name without extension;
        MyWebDynproComponent.wdcomponent — this is file name with extension “wdcomponent” (and not “ent”!!!)

        Now try these samples with your code.

        Ok, you catch me with reference to char[]. But:
        1. It sounds like design decision of SUN to minimize memory consumption: several calls to substring does not create several char[] arrays, one array is shared. There is certain proportion between original String length and lenghes of created sub-Strings when this approach show better memory consumption metrics (obviously, performance is better as well — no array copying)
        2. Again you are pointing to a problem(?) but providing no solution/alternative.

        VS

        (0) 
      2. Valery Silaev
        Markus,

        [CITE]In fact I think, that when even an experienced developer like you doesn’t seem to believe that it works this way ( your are not the first one, by the way), then that is for sure an indicator that others wouldn’t even imagine, that it could be implemented this way.[/CITE]

        AND THIS IS “GOOD THING”!
        I can work with higher abstraction level without having to know implementation internals. OOP is about this, isn’t it???
        Also, this _never_ causes memory-consumption problems in my projects (and I’m programming using Java for approx. 10 years). Is it an issue after that?

        VS

        (0) 
        1. Markus Kohler Post author
          Hi,
          As I pointed out, It’s a feature, but it can lead to problems, if used in the wrong way.
          If used correctly it can even save some memory.

          I said something like “some people believe it’s bug”.

          Actually the problem was so far,that finding these kind of problems in a production system was until recently almost impossible.

          Now that we have the new heap dump feature even in JDK 1.4, see my blog here The amazing  new heap dump feature in JDK 1.4.2_12, you can analyze, these kind of problems. 

          Regards,
          Markus

          (0) 

Leave a Reply