Skip to Content
Technical Articles
Author's profile photo Jörg Krause

Taming the RegEx monster

Being a long-time ABAPer who started up in the early 90’s, regular expressions are not part of my DNA. It is some sort of monster, ugly, dangerous and mysterious.

Then, I learned about the benefits of REGEX search and replace in the ABAP editor. Did you ever replace ‘ +’ with ‘ ‘? You know what I mean. This made me change my mind on a certain scale, and I found myself googling about REGEX to do more sophisticated search/replace in the editor.

Recently, the ABAP language developers added a set of REGEX-armed functions such as the function matches. I must say, I rarely used them, meanly caused by bad readability. Who wants to read something like

IF  matches( val = itf_wa-tdline regex = `<DS:([^>]+)>.+` ).

when reviewing source code?

However, refactoring an old program, I stumbled across the lines:

if lv_ilart = 'REP' or
   lv_ilart = 'UMR' or
   lv_ilart = 'MAW'.

and reasoned about how to avoid the need of writing three times lv_ilart. In an SQL statement, I could write

where ilart in ('REP', 'UMR', 'MAW')

but how do I do in an IF? Some penetrant voice inside my head kept on whispering “Use a REGEX”. “No!” said I “nobody will understand it…”, but the voice said: “Why don’t you try?”.

After some googling about REGEX and writing some dummy code for testing it out, I arrived at

if matches( val = lv_ilart regex = 'REP|UMR|MAW' ).

Leaning back for a minute and staring at this line, I had to admit, that this could be understandable to a reader. So, if I keep my REGEX simple, it could be a good idea to use this feature.

But what about the more complicated cases? REGEX is a standard in programming, and it is very powerful. How could I use this and keep the code readable?

Mi approach is: put a complex REGEX operation into a small method whose name explains the purpose. For example, instead of writing:

if matches( val = test regex = '.+\_[0-9][0-9]' ).

I would prefer to write

if has_two_numbers_suffix( filename ).

method has_two_numbers_suffix.
  result = xsdbool( matches( val = in regex = '.+\_[0-9][0-9]' ) ).

Now, a reviewer understands what’s supposed to be going on. Even if he’s not able to decipher the REGEX, at least he knows what it should do.


Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Santhosh Kumar Cheekoti
      Santhosh Kumar Cheekoti

      Agree on code readability.  In order to decipher regex, using makes job bit easier. You can save the regex expn in and paste the URL in the code so reviewer can decipher regex by opening the link.

      We also need to be cautious when using because ABAP Regex library does not support everything.


      Author's profile photo Suhas Saha
      Suhas Saha

      RegEx support for ABAP has been around for quite sometime now. Tbh, i have never had the need to build fancy, complex RegEx’s.

      Earlier i have used a combination of to build the RegEx i would like. And then tested it using DEMO_REGEX_TOY report.

      Since i started using aUnit, i add unit tests to the mix. They serve as technical documentation for the developers who will be maintaining the code.

      Author's profile photo Jörg Krause
      Jörg Krause
      Blog Post Author

      My point was the readability. In my company, there are persons that do not code in ABAP, but they do read short dumps or sometimes debug. None of them knows even how to spell RegEx. So if I use heavy RegEx's, they won't understand the purpose of the line. The very useful testing tools mentioned here will not help in this case.

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Very nice blog post and an interesting read on your train of thoughts here.

      I think the approach of wrapping the test-logic (whether the filename ends with two numeric characters or not) into a small function with a telling name can really help to make the "main" program easier to understand.

      What I don't agree with so much is using regex for the examples mentioned. A simple list match can be done in many different ways and if there are just a few cases like in the example I don't see three IF branches as an issue.

      Regex, on the other hand, is rather heavyweight in its runtime behavior, easy to get wrong, and not obvious to the reader - which you mentioned in the comment.

      To me, this looks like an unfortunate trade: getting rid of having to write three IF branches and introducing a famously error-prone and confusing technology that absolutely requires to be explained in comments or documentation.
      Don't get me wrong, I don't mind regex per se and have used them in successfully in projects. But in my mind regex are always like a heavy-duty-power-tool that needs a lot of care and safety procedures when handling.

      my 2cts on that bit.

      Author's profile photo Jörg Krause
      Jörg Krause
      Blog Post Author

      Thank you for this valid topic! I totally agree with this.

      Author's profile photo Mike Pokraka
      Mike Pokraka

      I know this is an old blog, but just spotted and fully agree with the idea of documenting complex behaviour by putting it into a descriptively-named method.

      But the main reason for writing a comment is to offer an alternative to your three-IF statement that may work in some cases, namely CS in reverse:

      IF 'REP UMR MAW' CS lv_ilart.

      This must be used with care, ideally only with fixed domain or table-key variable as the above could technically also match the string "P U" for example. You can use any delimiters, commas or whatever.

      I use it in workflow - IF 'READY WAITING STARTED' CS status.

      Author's profile photo Jörg Krause
      Jörg Krause
      Blog Post Author

      Thanks for your thought. This was indeed my first approach of simplifying but I disliked the ambiguity as you described (would match also 'P U'). If somebody introduces for example 'UM' as a valid value for ILART, the logic would break. So I decided to use the precise REGEX statement.