Enhancing Regex Toy – Part 6

former_member225588 · ‎03-31-2023

This is the sixth and final in a series of six blogs describing how to enhance the regular expression tester known as Regex Toy, each blog describing a single enhancement to its capabilities.

Before applying the sixth patch

The preceding blog in this series described how to patch a copy of Regex Toy to enable it to preserve explicit line breaks in the text when it is presented in the Matches block, but ended with a description of an issue identifying differences in rendering the text in the Matches block based on whether the IN TABLE checkbox is checked or unchecked. To illustrate this problem again, execute the enhanced Regex Toy and follow these steps:

Paste the following tongue twister into the Text block:

A skunk sat on a stump. The skunk thunk the stump stunk and the stump thunk the skunk stunk.

Place a check mark into the IN TABLE check box.

Select the All Occurrences button.

Specify a dot in the Regex slot of the Input block.

Press enter.

As shown in the screen shot above, you should find that Regex Toy determines every character of the Text block matches the pattern specified in the Regex slot, illustrating this using alternating green and blue background highlighting of each character in the Matches block. Notice especially that there are 2 spaces between the two sentences, each one shown with a different background color highlighting.

Now remove the check mark from the IN TABLE checkbox.

As shown in the screen shot above, you should find that Regex Toy still determines every character of the Text block matches the pattern specified in the Regex slot, but presents only a single space between the two sentences in the Matches block. Indeed, checking and unchecking the IN TABLE checkbox in rapid succession will clearly indicate the difference between the Matches block having a single space or pair of spaces separating the two sentences.

The reason for the sixth patch

Depending on whether the Regex Toy IN TABLE checkbox is checked or unchecked will determine whether content in the Matches block will retain or ignore consecutive blank spaces specified with the content in the Text block.

Applying the sixth patch

Using your favorite ABAP editor, edit the copy of ABAP repository object DEMO_REGEX_TOY containing the previous patches and apply the following single line change in method display:

Change the following line of the REPLACE ALL OCCURRENCES OF: set of chained statements, from

` `       IN TABLE result_it WITH COND string( ... ),

to

` `       IN TABLE result_it WITH ‘&nbsp;’,

Note: If you copy the line above out of this blog and paste it into the ABAP editor, then you may need to change the characters enclosing the string " " to explicit apostrophes.

This patch restores the line to the way it appeared prior to the update adding the IN TABLE feature.

After applying the sixth patch

Now activate the program and execute it using the same process described previously:

Paste the following tongue twister into the Text block:

A skunk sat on a stump. The skunk thunk the stump stunk and the stump thunk the skunk stunk.

Place a check mark into the IN TABLE check box.

Select the All Occurrences button.

Specify a dot in the Regex slot of the Input block.

Press enter.

Notice the 2 spaces between the two sentences, each one shown with a different background color highlighting. Now remove the check mark from the IN TABLE checkbox and you should find there is no difference in the rendering of the content in the Matches block – the 2 spaces between the two sentences are retained and shown with a different background color highlighting for each one.

Epilogue

With the six patches applied, it should render moot whether the IN TABLE checkbox is checked or unchecked, relegating it as an unnecessary feature, and now Regex Toy behaves similarly to Regex Storm by:

Enabling trailing spaces in the regular expression to be observed

Providing a way to see spaces matching the regular expression by using alternating green and blue background highlighting to show matches

Enabling the regular expression to accommodate implicit line breaks in the searched text

Presenting matching text within the constraints of the Matches block

Preserving explicit line breaks in the text displayed in the Matches block

Retaining consecutive spaces specified in the searched text for display in the Matches block

I had mentioned in the third blog of this series that the lack of explicit online program documentation for Regex Toy did not concern me much because it seemed intuitive enough not to require any, but became less intuitive with the introduction of the IN TABLE checkbox. I can speculate that this checkbox is so named because the programmer was attempting to present the Options in way closely resembling the syntax for the ABAP FIND statement, which can be charted this way:

FIND [[FIRST OCCURRENCE | ALL OCCURRENCES] OF]

     [[SUBSTRING] substring | REGEX regex]

     IN

     [[SECTION [OFFSET offset] [LENGTH length] OF] data-object |

      TABLE internal-table [FROM line-1 [OFFSET offset-1]] [TO line-2 [OFFSET offset-2]]]

     [IN [CHARACTER | BYTE] MODE] 

     [[RESPECTING | IGNORING] CASE]

     ...

You can see in the FIND syntax chart above that “first or all occurrences” is the first clause, followed by the REGEX clause to describe the regular expression, followed by the clause to designate whether the content to be searched is “in a data object or internal table”, followed by a clause to indicate whether “case is to be respected or ignored”, and these clauses correlate to the sequence of options appearing from left to right in the Options block shown below.

Options block with 7.5 version:

The pair of radio buttons for FIRST OCCURRENCE and ALL OCCURRENCES as well as the pair for IGNORING CASE and RESPECTING CASE are options applicable to controlling the scope of a regular expression, but the IN TABLE checkbox is not. Indeed, this checkbox controls whether the internal processing of Regex Toy does or does not use an internal table for its regex processing, and in that regard its purpose as a processing option is knowable only by the user analyzing the source code.

Accordingly, it is unfortunate that no online documentation is provided to explain this behavior to the user. Indeed, even it if had been explained it does not seem to me to be something users would need to control – their concern in using this tool simply is with whether it provides the correct answer to the regular expression they’ve provided and not with how it goes about determining that answer. An argument can be made that the checkbox provides a way for the user to determine how an applicable FIND or REPLACE statement can be written in some other program, using this utility as a model for first getting it to work properly, but 1) this also requires the user to explore the source code to determine how those statements are operating, and 2) if that is an intended use of the utility, perhaps it should have been provided with online documentation explaining this.

Meanwhile, there is more you might want to know about how SAP handles regular expressions. The E-bite had stated that there are two regular expression industry standards, known as Perl and POSIX, and noted that the implementation of regular expression handling with the Regex-related ABAP statements conforms with the POSIX standard. Since then, SAP has introduced support for the Perl standard, as described in this series of blogs by julius.bettin as well as this blog by safa_bahoosh.

These blogs describe support for what are known as Perl Compatible Regular Expressions, abbreviated PCRE. The blog by Safa Golrokh Bahoosh states in the second paragraph that using the POSIX-oriented syntax with those ABAP statements supporting regular expressions is considered obsolete with release 7.55. Don’t be too alarmed by the fact that POSIX support is destined for obsolescence. Instead, regard it as yet another feature of ABAP that will remain available to the language for some time, perhaps forever, a fate similar to that of the ABAP statements FORM and ENDFORM, rendered obsolete by SAP well over a decade ago but which still thrive within the ABAP community, continuing to be preferred by many programmers over the use of their object-oriented counterparts METHOD and ENDMETHOD that SAP has encouraged programmers to use instead.

From what I was able to conclude reading those blogs, it seems the REGEX clause used with the ABAP statements FIND and REPLACE and the regex parameter used with the string functions FIND, REPLACE, COUNT, CONTAINS, etc., is being replaced with a counterpart PCRE clause/parameter. Accordingly, I expect existing ABAP statements using regular expressions will continue to work as they always have worked, but if you want them now to use the Perl standard then you’d have to make an explicit change to the statement to replace the REGEX clause/parameter with its counterpart PCRE clause/parameter. That way, it is the programmer who is in control over which regular expression standard shall be observed by a given regex-enabled ABAP statement.

The E-bite mentioned the concept of greedy and lazy pattern matching, stating that SAP did not yet provide support for lazy pattern matching. The blogs by Julius Bettin and Safa Golrokh Bahoosh indicate that lazy pattern matching becomes available with SAP’s support of the PCRE flavor of regular expressions. In addition, support has been added to use regular expressions with ABAP SQL statements, CDS Views and much more.

Also, the blog by Safa Golrokh Bahoosh suggests that DEMO_REGEX_TOY and its older cousin DEMO_REGEX have been updated in NetWeaver release 7.55 to support the PCRE standard by including a screen shot of the updated presentation screen, which looks like this:

The screen shot above shows that the Regex field has been expanded from a single row to about the same block size as the Text block, and now the user is able to indicate which regular expression industry standard is to be used to provide the behavior, explicitly indicating that the POSIX behavior, while it still may be selected by this utility, is considered obsolete.

So, for those of you who actively use regular expressions with your programming activities, until your site migrates to a NetWeaver release of 7.55 or higher, I hope this blog series was able to help you enhance Regex Toy yourself to provide a more satisfying experience using it.