In one of my last reviews, I tumbled about a simple piece of code, which does not let me go...
Suppose, you are within a SAP Netweaver BW Transformation and want to replace all unallowed characters during a transformation rule:
METHOD _compute_XXXXX
...
CALL FUNCTION z_replace_unallowed_chars
EXPORTING i_text = i_value_in
IMPORTING e_text = r_value_out
...
ENDMETHOD.
and the function module looks nearly this way:
FUNCTION z_replace_unallowed_chars..
*"----------------------------------------------------------------------
*"*"Local Interface:
*" IMPORTING
*" REFERENCE(I_TEXT) TYPE CHAR100 OPTIONAL
*" REFERENCE(I_DEFAULT) TYPE C OPTIONAL
*" EXPORTING
*" REFERENCE(E_TEXT) TYPE CHAR100
*"----------------------------------------------------------------------
...
*----------------------------------------------------------------------*
* Determine unallowed characters
*----------------------------------------------------------------------*
IF g_char IS INITIAL.
CALL FUNCTION 'RSKC_ALLOWED_CHAR_GET'
IMPORTING
e_allowed_char = w_allowed_char.
g_char = 'X'.
ENDIF.
*----------------------------------------------------------------------*
* Replacement
*----------------------------------------------------------------------*
DESCRIBE FIELD i_eingabe LENGTH len IN CHARACTER MODE.
boole = true.
e_ausgabe = i_eingabe.
lv_def = i_default.
TRANSLATE lv_def TO UPPER CASE.
WHILE boole = true.
IF e_ausgabe CO w_allowed_char.
EXIT.
ENDIF.
IF sy-fdpos < len.
pos = sy-fdpos.
e_ausgabe+pos(1) = lv_def.
ENDIF.
IF sy-fdpos = len.
boole = false.
ENDIF.
ENDWHILE.
ENDFUNCTION.
After reading this function module and its calling routine, I thought by myself: Ooops, something is going wrong there or is not efficient as it should be.
I tumbled not about the replacement procedure or the calling, of the BW function module. My focus has moved to the replacement block. Ok, there is no real mistake, there might have been more inbound functions like strlen( ) and there might have been another loop technique.
But: Why performing such a circumstantial way to replace the characters? Even it is not running correct in all cases, it takes a lot of performance, which cumulates especially in ETL-processes on BW, because this function module is not called 1, 2 or 3 times. When you load millions of lines into your BW it should be, that this function module is called more than 3 times per row!
My solution for this point: Using regular expressions within the ABAP built-in functionality:
REPLACE ALL OCCURRENCES OF REGEX l_pattern IN l_text WITH ii_default.
This line of code does the same, as the lines 23..39 of the listing above!
But what is this pattern-thing? Regular expressions are a very efficient way, to perform textual-operations. Within such an expression, you can define a syntax, which the matcher checks against your input. For example: EMail-Addresses, Date-Format, Naming-conventions, un-wanted characters and so on.
The pattern for this case looks like this:
[^ALLCHARACTERSYOUWANTTO]
Note: replace ALLCHARACETERSYOUWANTTO with the concrete one.
This pattern means: ,,Every character except those within the brackets''. And that's the same, as the replacement loop. The regex-processor will now proceed the text in i_test and replace all matches of the pattern with the value in i_default.
Now, the new function module looks like this:
FUNCTION z_replace_unallowed_chars..
*"----------------------------------------------------------------------
*"*"Local Interface:
*" IMPORTING
*" REFERENCE(I_TEXT) TYPE CHAR100 OPTIONAL
*" REFERENCE(I_DEFAULT) TYPE C OPTIONAL
*" EXPORTING
*" REFERENCE(E_TEXT) TYPE CHAR100
*"----------------------------------------------------------------------
...
*----------------------------------------------------------------------*
* Determine unallowed characters
*----------------------------------------------------------------------*
IF g_pattern IS INITIAL.
CALL FUNCTION 'RSKC_ALLOWED_CHAR_GET'
IMPORTING
e_allowed_char = l_allowed_char.
CONCATENATE '([^' w_allowed_char '])' into g_pattern.
ENDIF.
*----------------------------------------------------------------------*
* Replacement
*----------------------------------------------------------------------*
REPLACE ALL OCCURRENCES OF REGEX l_pattern IN e_text WITH i_default.
ENDFUNCTION
Have a deeper look into line 19 and line 24 where the pattern is build up and used.
After testing this against correctness, I have performed a performance trace and you will see, that the new implementation is 20% faster (tested on a demo-instance, with table SBOOK and 1,300,000 rows, one field per row replaced). replace1 is the old implementation, replace2 the new one:
If you have a higher diversity of your input data, I expect, that this implementation speed up again.
Have a deeper look into the ABAP-documentation: http://help.sap.com/abapdocu_70/en/ABENREGULAR_EXPRESSIONS.htm or try it by yourself with the report DEMO_REGEX_TOY.
Have fun!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
3 | |
3 | |
3 | |
2 | |
2 | |
2 | |
2 | |
1 | |
1 | |
1 |