dynamicSubstring – simplifying your substring logic
Introduction
A common mapping requirement that crops up every now and then is the requirement to extract a portion of data from a source field. This can be easily achieved with the standard function substring from the Text category. However, the substring function can only be configured with static start position and character count parameter values. If the input to the function is a dynamic content of variable length, it is possible to encounter the familiar “java.lang.StringIndexOutOfBoundsException: String index out of range” exception when the input is shorter than expected.
Below are some common approaches to handle such scenarios:
- Create a custom substring UDF based on java.lang.String’s substring method with arguments as input instead of parameters.
- Create a mapping logic to determine length of input and reconstruct it prior to input of substring function.
In this blog, I will share an alternative approach using a simple UDF named dynamicSubstring. The usage and configuration is similar to the standard substring function, however it will dynamically calculate the length of the input value so that the substring operation does not trigger the out of range exception.
Source code
The UDF is a Single Values execution type UDF, having 1 String input argument and 2 configurable parameters.
@LibraryMethod(title="dynamicSubstring", description="Get substring of input", category="FL_TextPool", type=ExecutionType.SINGLE_VALUE)
public String dynamicSubstring (
@Argument(title="Input String") String input,
@Parameter(title="Start index") int start,
@Parameter(title="Length of output") int length,
Container container) throws StreamTransformationException{
// ----------------------------------------------------------
// Extracts the substring of an input String
// ----------------------------------------------------------
// input - input string
// start - position index for start of string
// length - length of substring
// ----------------------------------------------------------
int startPos = start;
int endPos = start + length - 1;
String output = "";
if ( startPos < 0 ) {
// (1) Start position is before start of input, return empty string
output = "";
} else if ( startPos >= 0 && startPos < input.length() ) {
if ( endPos < input.length() ) {
// (2) Start & end positions are before end of input, return the partial substring
output = input.substring( startPos, endPos + 1 );
} else if ( endPos >= input.length() ) {
// (3) Start position is before start of input but end position is after end of input, return from start till end of input
output = input.substring( startPos, input.length() );
}
} else if ( startPos >= input.length() ) {
// (4) Start position is after end of input, return empty string
output = "";
}
return output;
}
Testing Results
Scenario 1
Start position and end position is less than length of input. This scenario can be achieved by standard function without any exception.
Scenario 2
The input value is now shorter that scenario 1 such that start position is less than input length, but the end position is more than input length. For standard function, this will cause an exception. However, for dynamicSubstring, it will just extract from the start position until the end of the input.
Scenario 3
Finally, the input value has shortened significantly such that the start position is more than the input length. Again this will cause an exception with the standard function. However, for dynamicSubstring, it will just return an empty string.
Conclusion
With the usage of dynamicSubstring, we can now simplify mapping logic that involve substring operations on content that is variable in length.
Good tips, thanks !
Hi Eng,
A great way to work with substrings.
Also, in the cases that you know the start and the end positions, it's possible to do it easily:
imports java.lang.String;
public String subStringDinamico(String a,String b,String c,Container container){
int inicio =0;
int fin = 0;
try {
inicio=Integer.parseInt(a);
fin = Integer.parseInt(b);
}
catch(NumberFormatException e) { }
return (c.substring(inicio,fin));
}
Although the exception could happen instead of your code that you avoid it.
Regards.
Hi Inaki
Yes, my logic above are for the times when I don't care if the extracted length is less than the configured length - I just don't want to get any exception.
By the way, for your logic above, you can simplify it further if you define the input arguments as integer directly so that you don't need to parse the string 🙂
Rgds
Eng Swee
Hi Eng,
Good one, This is very common requirement when we work with substring function.
Regards,
Praveen.
Hi Eng,
A very common issue most of the consultants face while working with substring. Thanks for this
Regards,
Sriram
Eng,
Appreciate your effort to address a common requirement.
Just for reader benefit, this can also be achieved using below UDF.
Trim can be used to delete pre and post spaces, after substring.
Hi Raghu
Thanks for sharing the UDF. That would be something along the lines of the second point I mentioned in the Intro section above.
It would however require 4 functions (including trim after substring) instead of a single function. Anyway, it's always good to have options available for comparison. 😉
Rgds
Eng Swee
Hi, great blog. Thanks. I called that UDF safeSubstring having one input and two integer Parameters instead of Arguments, as properties. Is there a way to get properties show like in the standard substring function, like on the image below?