Taming the RegEx monster
Being a long-time ABAPer who started up in the early 90’s, regular expressions are not part of my DNA. It is some sort of monster, ugly, dangerous and mysterious.
Then, I learned about the benefits of REGEX search and replace in the ABAP editor. Did you ever replace ‘ +’ with ‘ ‘? You know what I mean. This made me change my mind on a certain scale, and I found myself googling about REGEX to do more sophisticated search/replace in the editor.
Recently, the ABAP language developers added a set of REGEX-armed functions such as the function matches. I must say, I rarely used them, meanly caused by bad readability. Who wants to read something like
IF matches( val = itf_wa-tdline regex = `<DS:([^>]+)>.+` ).
when reviewing source code?
However, refactoring an old program, I stumbled across the lines:
if lv_ilart = 'REP' or lv_ilart = 'UMR' or lv_ilart = 'MAW'.
and reasoned about how to avoid the need of writing three times lv_ilart. In an SQL statement, I could write
where ilart in ('REP', 'UMR', 'MAW')
but how do I do in an IF? Some penetrant voice inside my head kept on whispering “Use a REGEX”. “No!” said I “nobody will understand it…”, but the voice said: “Why don’t you try?”.
After some googling about REGEX and writing some dummy code for testing it out, I arrived at
if matches( val = lv_ilart regex = 'REP|UMR|MAW' ).
Leaning back for a minute and staring at this line, I had to admit, that this could be understandable to a reader. So, if I keep my REGEX simple, it could be a good idea to use this feature.
But what about the more complicated cases? REGEX is a standard in programming, and it is very powerful. How could I use this and keep the code readable?
Mi approach is: put a complex REGEX operation into a small method whose name explains the purpose. For example, instead of writing:
if matches( val = test regex = '.+\_[0-9][0-9]' ).
I would prefer to write
if has_two_numbers_suffix( filename ). (...) method has_two_numbers_suffix. result = xsdbool( matches( val = in regex = '.+\_[0-9][0-9]' ) ). endmethod.
Now, a reviewer understands what’s supposed to be going on. Even if he’s not able to decipher the REGEX, at least he knows what it should do.