Quickwin using regular expressions

hendrik_brandes · ‎10-25-2012

In one of my last reviews, I tumbled about a simple piece of code, which does not let me go...

Suppose, you are within a SAP Netweaver BW Transformation and want to replace all unallowed characters during a transformation rule:

METHOD _compute_XXXXX

...

CALL FUNCTION z_replace_unallowed_chars

  EXPORTING i_text = i_value_in

  IMPORTING e_text = r_value_out

...

ENDMETHOD.

and the function module looks nearly this way:

FUNCTION z_replace_unallowed_chars..

*"----------------------------------------------------------------------

*"*"Local Interface:

*"  IMPORTING

*"     REFERENCE(I_TEXT) TYPE  CHAR100 OPTIONAL

*"     REFERENCE(I_DEFAULT) TYPE  C OPTIONAL

*"  EXPORTING

*"     REFERENCE(E_TEXT) TYPE  CHAR100

*"----------------------------------------------------------------------

...

*----------------------------------------------------------------------*

*       Determine unallowed characters

*----------------------------------------------------------------------*

  IF    g_char  IS  INITIAL.

    CALL FUNCTION 'RSKC_ALLOWED_CHAR_GET'

      IMPORTING

        e_allowed_char = w_allowed_char.

    g_char    =    'X'.

  ENDIF.

*----------------------------------------------------------------------*

*      Replacement

*----------------------------------------------------------------------*

  DESCRIBE FIELD i_eingabe LENGTH len IN CHARACTER MODE.

  boole    =     true.

  e_ausgabe  =   i_eingabe.

  lv_def = i_default.

  TRANSLATE lv_def TO UPPER CASE.

  WHILE   boole =  true.

    IF   e_ausgabe CO  w_allowed_char.

      EXIT.

    ENDIF.

    IF sy-fdpos < len.

      pos =    sy-fdpos.

      e_ausgabe+pos(1)  =   lv_def.

    ENDIF.

    IF sy-fdpos =  len.  

      boole    =  false.

    ENDIF.

  ENDWHILE.

ENDFUNCTION.

After reading this function module and its calling routine, I thought by myself: Ooops, something is going wrong there or is not efficient as it should be.

I tumbled not about the replacement procedure or the calling, of the BW function module. My focus has moved to the replacement block. Ok, there is no real mistake, there might have been more inbound functions like strlen( ) and there might have been another loop technique.

But: Why performing such a circumstantial way to replace the characters? Even it is not running correct in all cases, it takes a lot of performance, which cumulates especially in ETL-processes on BW, because this function module is not called 1, 2 or 3 times. When you load millions of lines into your BW it should be, that this function module is called more than 3 times per row!

My solution for this point: Using regular expressions within the ABAP built-in functionality:

REPLACE ALL OCCURRENCES OF REGEX l_pattern IN l_text WITH ii_default.

This line of code does the same, as the lines 23..39 of the listing above!

But what is this pattern-thing? Regular expressions are a very efficient way, to perform textual-operations. Within such an expression, you can define a syntax, which the matcher checks against your input. For example: EMail-Addresses, Date-Format, Naming-conventions, un-wanted characters and so on.

The pattern for this case looks like this:

[^ALLCHARACTERSYOUWANTTO]

Note: replace ALLCHARACETERSYOUWANTTO with the concrete one.

This pattern means: ,,Every character except those within the brackets''. And that's the same, as the replacement loop. The regex-processor will now proceed the text in i_test and replace all matches of the pattern with the value in i_default.

Now, the new function module looks like this:

FUNCTION z_replace_unallowed_chars..

*"----------------------------------------------------------------------

*"*"Local Interface:

*"  IMPORTING

*"     REFERENCE(I_TEXT) TYPE  CHAR100 OPTIONAL

*"     REFERENCE(I_DEFAULT) TYPE  C OPTIONAL

*"  EXPORTING

*"     REFERENCE(E_TEXT) TYPE  CHAR100

*"----------------------------------------------------------------------

...

*----------------------------------------------------------------------*

*       Determine unallowed characters

*----------------------------------------------------------------------*

  IF    g_pattern  IS  INITIAL.

    CALL FUNCTION 'RSKC_ALLOWED_CHAR_GET'

      IMPORTING

        e_allowed_char = l_allowed_char.

        CONCATENATE '([^' w_allowed_char '])' into g_pattern.

  ENDIF.

*----------------------------------------------------------------------*

*      Replacement

*----------------------------------------------------------------------*

  REPLACE ALL OCCURRENCES OF REGEX l_pattern IN e_text WITH i_default.

ENDFUNCTION

Have a deeper look into line 19 and line 24 where the pattern is build up and used.

After testing this against correctness, I have performed a performance trace and you will see, that the new implementation is 20% faster (tested on a demo-instance, with table SBOOK and 1,300,000 rows, one field per row replaced). replace1 is the old implementation, replace2 the new one:

If you have a higher diversity of your input data, I expect, that this implementation speed up again.

Have a deeper look into the ABAP-documentation: http://help.sap.com/abapdocu_70/en/ABENREGULAR_EXPRESSIONS.htm or try it by yourself with the report DEMO_REGEX_TOY.

Have fun!

Quickwin using regular expressions

Get Started with the ABAP Development Tools for SAP NetWeaver

Become an ABAP in Eclipse Feature Explorer and earn the Explorer Badge

Six kinds of debugging tips to find the source code where the message is raised