Skip to Content

Introduction

A common mapping requirement that crops up every now and then is the requirement to extract a portion of data from a source field. This can be easily achieved with the standard function substring from the Text category. However, the substring function can only be configured with static start position and character count parameter values. If the input to the function is a dynamic content of variable length, it is possible to encounter the familiar “java.lang.StringIndexOutOfBoundsException: String index out of range” exception when the input is shorter than expected.

Below are some common approaches to handle such scenarios:

  • Create a custom substring UDF based on java.lang.String’s substring method with arguments as input instead of parameters.
  • Create a mapping logic to determine length of input and reconstruct it prior to input of substring function.

In this blog, I will share an alternative approach using a simple UDF named dynamicSubstring. The usage and configuration is similar to the standard substring function, however it will dynamically calculate the length of the input value so that the substring operation does not trigger the out of range exception.

Source code

The UDF is a Single Values execution type UDF, having 1 String input argument and 2 configurable parameters.


@LibraryMethod(title="dynamicSubstring", description="Get substring of input", category="FL_TextPool", type=ExecutionType.SINGLE_VALUE)
public String dynamicSubstring (
   @Argument(title="Input String")  String input,
   @Parameter(title="Start index")  int start,
   @Parameter(title="Length of output")  int length,
   Container container)  throws StreamTransformationException{
  // ----------------------------------------------------------
  // Extracts the substring of an input String  
  // ----------------------------------------------------------
  // input - input string
  // start - position index for start of string
  // length - length of substring
  // ----------------------------------------------------------
  int startPos = start;
  int endPos = start + length - 1;
  String output = "";
  if ( startPos < 0 ) {
   // (1) Start position is before start of input, return empty string
   output = "";
  } else if ( startPos >= 0 && startPos < input.length() ) {
   if ( endPos < input.length() ) {
    // (2) Start & end positions are before end of input, return the partial substring
    output = input.substring( startPos, endPos + 1 );
   } else if ( endPos >= input.length() ) {
    // (3) Start position is before start of input but end position is after end of input, return from start till end of input
    output = input.substring( startPos, input.length() );
   }
  } else if ( startPos >= input.length() ) {
   // (4) Start position is after end of input, return empty string
   output = "";
  }
  return output;
}

Testing Results

Scenario 1

Start position and end position is less than length of input. This scenario can be achieved by standard function without any exception.

/wp-content/uploads/2015/03/test1_670420.png

Scenario 2

The input value is now shorter that scenario 1 such that start position is less than input length, but the end position is more than input length. For standard function, this will cause an exception. However, for dynamicSubstring, it will just extract from the start position until the end of the input.

/wp-content/uploads/2015/03/test2a_670428.png

Scenario 3

Finally, the input value has shortened significantly such that the start position is more than the input length. Again this will cause an exception with the standard function. However, for dynamicSubstring, it will just return an empty string.

/wp-content/uploads/2015/03/test3_670429.png

Conclusion

With the usage of dynamicSubstring, we can now simplify mapping logic that involve substring operations on content that is variable in length.

To report this post you need to login first.

7 Comments

You must be Logged on to comment or reply to a post.

  1. Iñaki Vila

    Hi Eng,

    A great way to work with substrings.

    Also, in the cases that you know the start and the end positions, it’s possible to do it easily:

    imports java.lang.String;

    public String subStringDinamico(String a,String b,String c,Container container){

    int inicio =0;

    int fin = 0;

    try {

         inicio=Integer.parseInt(a);

        fin =  Integer.parseInt(b);

    }

    catch(NumberFormatException e) { }

    return (c.substring(inicio,fin));

    }

    Although the exception could happen instead of your code that you avoid it.

    Regards.

    (0) 
    1. Eng Swee Yeoh Post author

      Hi Inaki

      Yes, my logic above are for the times when I don’t care if the extracted length is less than the configured length – I just don’t want to get any exception.

      By the way, for your logic above, you can simplify it further if you define the input arguments as integer directly so that you don’t need to parse the string 🙂

      /wp-content/uploads/2015/03/int_670592.png

      Rgds

      Eng Swee

      (0) 
    1. Eng Swee Yeoh Post author

      Hi Raghu

      Thanks for sharing the UDF. That would be something along the lines of the second point I mentioned in the Intro section above.

      It would however require 4 functions (including trim after substring) instead of a single function. Anyway, it’s always good to have options available for comparison. 😉

      Rgds

      Eng Swee

      (0) 

Leave a Reply