Skip to Content
Author's profile photo Eng Swee Yeoh

dynamicSubstring – simplifying your substring logic

Introduction

A common mapping requirement that crops up every now and then is the requirement to extract a portion of data from a source field. This can be easily achieved with the standard function substring from the Text category. However, the substring function can only be configured with static start position and character count parameter values. If the input to the function is a dynamic content of variable length, it is possible to encounter the familiar “java.lang.StringIndexOutOfBoundsException: String index out of range” exception when the input is shorter than expected.

Below are some common approaches to handle such scenarios:

  • Create a custom substring UDF based on java.lang.String’s substring method with arguments as input instead of parameters.
  • Create a mapping logic to determine length of input and reconstruct it prior to input of substring function.

In this blog, I will share an alternative approach using a simple UDF named dynamicSubstring. The usage and configuration is similar to the standard substring function, however it will dynamically calculate the length of the input value so that the substring operation does not trigger the out of range exception.

Source code

The UDF is a Single Values execution type UDF, having 1 String input argument and 2 configurable parameters.


@LibraryMethod(title="dynamicSubstring", description="Get substring of input", category="FL_TextPool", type=ExecutionType.SINGLE_VALUE)
public String dynamicSubstring (
   @Argument(title="Input String")  String input,
   @Parameter(title="Start index")  int start,
   @Parameter(title="Length of output")  int length,
   Container container)  throws StreamTransformationException{
  // ----------------------------------------------------------
  // Extracts the substring of an input String  
  // ----------------------------------------------------------
  // input - input string
  // start - position index for start of string
  // length - length of substring
  // ----------------------------------------------------------
  int startPos = start;
  int endPos = start + length - 1;
  String output = "";
  if ( startPos < 0 ) {
   // (1) Start position is before start of input, return empty string
   output = "";
  } else if ( startPos >= 0 && startPos < input.length() ) {
   if ( endPos < input.length() ) {
    // (2) Start & end positions are before end of input, return the partial substring
    output = input.substring( startPos, endPos + 1 );
   } else if ( endPos >= input.length() ) {
    // (3) Start position is before start of input but end position is after end of input, return from start till end of input
    output = input.substring( startPos, input.length() );
   }
  } else if ( startPos >= input.length() ) {
   // (4) Start position is after end of input, return empty string
   output = "";
  }
  return output;
}

Testing Results

Scenario 1

Start position and end position is less than length of input. This scenario can be achieved by standard function without any exception.

/wp-content/uploads/2015/03/test1_670420.png

Scenario 2

The input value is now shorter that scenario 1 such that start position is less than input length, but the end position is more than input length. For standard function, this will cause an exception. However, for dynamicSubstring, it will just extract from the start position until the end of the input.

/wp-content/uploads/2015/03/test2a_670428.png

Scenario 3

Finally, the input value has shortened significantly such that the start position is more than the input length. Again this will cause an exception with the standard function. However, for dynamicSubstring, it will just return an empty string.

/wp-content/uploads/2015/03/test3_670429.png

Conclusion

With the usage of dynamicSubstring, we can now simplify mapping logic that involve substring operations on content that is variable in length.

Assigned Tags

      8 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Jean-Philippe Pain
      Jean-Philippe Pain

      Good tips, thanks !

      Author's profile photo Iñaki Vila
      Iñaki Vila

      Hi Eng,

      A great way to work with substrings.

      Also, in the cases that you know the start and the end positions, it's possible to do it easily:

      imports java.lang.String;

      public String subStringDinamico(String a,String b,String c,Container container){

      int inicio =0;

      int fin = 0;

      try {

           inicio=Integer.parseInt(a);

          fin =  Integer.parseInt(b);

      }

      catch(NumberFormatException e) { }

      return (c.substring(inicio,fin));

      }

      Although the exception could happen instead of your code that you avoid it.

      Regards.

      Author's profile photo Eng Swee Yeoh
      Eng Swee Yeoh
      Blog Post Author

      Hi Inaki

      Yes, my logic above are for the times when I don't care if the extracted length is less than the configured length - I just don't want to get any exception.

      By the way, for your logic above, you can simplify it further if you define the input arguments as integer directly so that you don't need to parse the string 🙂

      /wp-content/uploads/2015/03/int_670592.png

      Rgds

      Eng Swee

      Author's profile photo Praveen Gandepalli
      Praveen Gandepalli

      Hi Eng,

      Good one, This is very common requirement when we work with substring function.

      Regards,

      Praveen.

      Author's profile photo Former Member
      Former Member

      Hi Eng,

      A very common issue most of the consultants face while working with substring. Thanks for this

      Regards,

      Sriram

      Author's profile photo Raghu Vamseedhar Reddy Kadipi Reddy
      Raghu Vamseedhar Reddy Kadipi Reddy

      Eng,

      Appreciate your effort to address a common requirement.

      Just for reader benefit, this can also be achieved using below UDF.

      substring.PNG

      Trim can be used to delete pre and post spaces, after substring.

      Author's profile photo Eng Swee Yeoh
      Eng Swee Yeoh
      Blog Post Author

      Hi Raghu

      Thanks for sharing the UDF. That would be something along the lines of the second point I mentioned in the Intro section above.

      It would however require 4 functions (including trim after substring) instead of a single function. Anyway, it's always good to have options available for comparison. 😉

      Rgds

      Eng Swee

      Author's profile photo Sven Kelemen
      Sven Kelemen

      Hi, great blog. Thanks. I called that UDF safeSubstring having one input and two integer Parameters instead of Arguments, as properties. Is there a way to get properties show like in the standard substring function, like on the image below?