Usually, macros are not welcomed by developers, and in many teams they are banned in the programming guidelines. Obviously, there are arguments that speak against macros. But in recent years, I learned to appreciate their positive features. I would not recommend a general macro ban in an ABAP development project. In this blog, I will explain why.
No doubt: Macros are no real modularization technique, like includes they are lacking an own execution context. And, yes, with the exception of extremely simple macros, they are difficult to debug (I will come to that later). They simply are source code transformations at a pre-compile stage. But they have a number of advantages.
The positive points that I see with macros are:
- They help to make code more readable.
- They help hide implementation details from the code’s main intentions.
- They help avoid code repetition.
A Sample Refactoring Session
Consider the following, typical example. It is a piece from our codebase (although 10 years old, if you accept this as excuse…).
FORM alv1_fieldcat_create CHANGING ct_fieldcat TYPE slis_t_fieldcat_alv. DATA: ls_fieldcat TYPE slis_fieldcat_alv. ls_fieldcat-fieldname = 'CHECKBOX'. ls_fieldcat-tabname = gc_tab_head. ls_fieldcat-col_pos = 0. APPEND ls_fieldcat TO ct_fieldcat. CLEAR ls_fieldcat. ls_fieldcat-fieldname = 'ABELN'. ls_fieldcat-tabname = gc_tab_head. ls_fieldcat-col_pos = 1. ls_fieldcat-hotspot = gc_x. APPEND ls_fieldcat TO ct_fieldcat. CLEAR ls_fieldcat. ls_fieldcat-fieldname = 'BEZCH'. ls_fieldcat-tabname = gc_tab_head. ls_fieldcat-col_pos = 2. ls_fieldcat-hotspot = gc_x. APPEND ls_fieldcat TO ct_fieldcat. CLEAR ls_fieldcat. ls_fieldcat-fieldname = 'STAT_FIELD'. ls_fieldcat-tabname = gc_tab_head. ls_fieldcat-col_pos = 3. APPEND ls_fieldcat TO ct_fieldcat. CLEAR ls_fieldcat. ls_fieldcat-fieldname = 'AUFAR'. ls_fieldcat-tabname = gc_tab_head. ls_fieldcat-col_pos = 4. APPEND ls_fieldcat TO ct_fieldcat. CLEAR ls_fieldcat. ls_fieldcat-fieldname = 'COUNT_POS'. ls_fieldcat-tabname = gc_tab_head. ls_fieldcat-col_pos = 5. APPEND ls_fieldcat TO ct_fieldcat. CLEAR ls_fieldcat. ls_fieldcat-fieldname = 'MESSAGE'. ls_fieldcat-tabname = gc_tab_head. ls_fieldcat-col_pos = 6. APPEND ls_fieldcat TO ct_fieldcat. ENDFORM. " alv1_fieldcat_create
What is the intention of this code? A field catalog has to be filled: For each field, the column name and the position in the grid (col_pos) are specified. Two of the fields are designed as hotspot fields (hotspot = ‘X’).
The code repetition is annoying. It distracts the eye from the essential part of the code. Only the parts that are varying from block to block, are what is really essential in this code. The rest is inessential boilerplate code. It is waste of time to be forced to study this repetitive code line by line, when this routine has to be inspected in a support or enhancement session.
An obvious improvement would be to parametrize the code and to use a modularization unit – in this case, a subroutine:
form add_field using iv_fieldname type c iv_col_pos type i changing ct_fieldcat type slis_t_fieldcat_alv. data: ls_fieldcat type slis_fieldcat_alv. ls_fieldcat-fieldname = iv_fieldname. ls_fieldcat-tabname = gc_tab_head. ls_fieldcat-col_pos = iv_col_pos. append ls_fieldcat to ct_fieldcat. endform. "add_field form set_hotspot using iv_fieldname type c changing ct_fieldcat type slis_t_fieldcat_alv. data: ls_fieldcat type slis_fieldcat_alv. ls_fieldcat-hotspot = 'X'. modify ct_fieldcat from ls_fieldcat transporting hotspot where fieldname = iv_fieldname. assert sy-subrc eq 0. endform. "set_hotspot
Note the assertion in routine set_hotspot which makes the program crash if a non-existing fieldname has been typed by the developer as actual parameter. This tribute to the “crash early” principle helps avoid typographical errors.
With these routines available, ALV1_FIELDCAT_CREATE could be identically rewritten as follows:
form alv2_fieldcat_create using ct_fieldcat type slis_t_fieldcat_alv. if list_var = 3. perform add_field using 'CHECKBOX' 0 changing ct_fieldcat. endif. perform add_field using 'ABELN' 1 changing ct_fieldcat. perform add_field using 'BEZCH' 2 changing ct_fieldcat. perform add_field using 'STAT_FIELD' 3 changing ct_fieldcat. perform add_field using 'AUFAR' 4 changing ct_fieldcat. perform add_field using 'COUNT_POS' 5 changing ct_fieldcat. perform add_field using 'MESSAGE' 6 changing ct_fieldcat. perform set_hotspot using 'ABELN' changing ct_fieldcat. perform set_hotspot using 'BEZCH' changing ct_fieldcat. endform. " alv2_fieldcat_create
With this change, we effectively saved 12 lines of code in comparison to the original version. Moreover, 14 lines of the new code, the two form routines, appear to be reusable in other programs as well. They could be extracted into a general-purpose subroutine pool for filling ALV field catalogs. In comparison, not a single line of the original code was reusable.
But the real win is that the routine alv2_fieldcat_create is now much better readable. Also, opening an extra stack level for each add_field call is no performance problem, since we are calling it only for a couple of fields (and not thousands of times – and even then: in the majority of the cases, not the call of a routine is a performance issue, but the code contained within that routine!).
Moreover, we do not need to explicitly initialize the workarea ls_fieldcat any more, since we have a little box with fresh, initialized data for each call. The need to expliticitly initialize something is bad, since it could be forgotten. In fact, I consider it as a code smell when I encounter a “CLEAR something” statement several times in the middle of a code block: The variable “something” obviously is reused for several purposes in the same code block. It is better to extract the code parts beginning with that “CLEAR something” statement into an own module (form routine, method, function module).
But I am still not happy with ALV2_FIELDCAT_CREATE. There is too much ado, too much “code noise” for such a simple task as adding some rows with field names to an internal table. The many “perform add_field” are still disturbing. I would like to employ the really useful “Colon/Comma” notation in order to write “perform add_field” only once.
Applying the “Colon/Comma” notation, gives the following:
form alv3_fieldcat_create using ct_fieldcat type slis_t_fieldcat_alv. if list_var = 3. perform add_field using 'CHECKBOX' 0 changing ct_fieldcat. endif. perform add_field using : 'ABELN' 1 changing ct_fieldcat, 'BEZCH' 2 changing ct_fieldcat, 'STAT_FIELD' 3 changing ct_fieldcat, 'AUFAR' 4 changing ct_fieldcat, 'COUNT_POS' 5 changing ct_fieldcat, 'MESSAGE' 6 changing ct_fieldcat. perform set_hotspot using : 'ABELN' changing ct_fieldcat, 'BEZCH' changing ct_fieldcat. endform. " alv3_fieldcat_create
This is slightly better, but still contains unwanted repetition. It’s clear that we are working with ct_fieldcat. We don’t want our attention led every second line to this fact. On the other hand, by the syntax rules, a CHANGING parameter has to come at the end of a PERFORM call and therefore always is part of the statement’s “tail” which can’t be abbreviated away with the Colon/Comma syntax.
This is where macros come into play. We could use these two very short macros for encapsulating the perform calls:
define _add_field. perform add_field using &1 &2 changing ct_fieldcat. end-of-definition. define _set_hotspot. perform set_hotspot using &1 changing ct_fieldcat. end-of-definition.
Note that these macros require the existence of certain named variables in the context where they are called: In this case, there needs to exist a variable with the name ct_fieldcat. This implies that these macros are not universally usable, but only for a specific context.
With these macros, the form routine now shrinks to its minimum size and almost maximum expressiveness:
form alv4_fieldcat_create using ct_fieldcat type slis_t_fieldcat_alv. if list_var = 3. _add_field 'CHECKBOX' 0. endif. _add_field : 'ABELN' 1, 'BEZCH' 2, 'STAT_FIELD' 3, 'AUFAR' 4, 'COUNT_POS' 5, 'MESSAGE' 6. _set_hotspot : 'ABELN', 'BEZCH'. endform. " alv4_fieldcat_create
If we add the code lines for the macros, the savings in lines of code are not gigantic. But that is not the important part. What really matters is that the routine for filling the field catalog is now reduced to its intended parts: What you see written in ALV4_FIELDCAT_CREATE is precisely what the developer intended. Not more. The rest is code noise, banned behind the scenes.
If the assignment of the col_pos to the field is not used elsewhere (and why should it?), I could even think of a version which auto-generates the col_pos. Since the code itself has a sequence, it is not necessary to specify this sequence a second time with the col_pos parameter.
The routine add_field could look like this:
form add_field using iv_fieldname type c changing ct_fieldcat type slis_t_fieldcat_alv. data: ls_fieldcat type slis_fieldcat_alv. ls_fieldcat-fieldname = iv_fieldname. ls_fieldcat-tabname = gc_tab_head. ls_fieldcat-col_pos = lines( ct_fieldcat ). append ls_fieldcat to ct_fieldcat. endform. "add_field
Correspondingly, the macro passes only one parameter:
define _add_field. perform add_field using &1 changing ct_fieldcat. end-of-definition.
And finally the explicit col_pos can be thrown out of the code:
form alv5_fieldcat_create using ct_fieldcat type slis_t_fieldcat_alv. if list_var = 3. _add_field 'CHECKBOX'. endif. _add_field : 'ABELN', 'BEZCH', 'STAT_FIELD', 'AUFAR', 'COUNT_POS', 'MESSAGE'. _set_hotspot : 'ABELN', 'BEZCH'. endform. " alv5_fieldcat_create
Summing up, we have refactored by using a combination of reusable subroutines and macros, taking advantage of both of these techniques:
- Subroutines (or methods or function modules) contribute by providing an own execution context, with auto-initialized local variables and clear parameters for import and export.
- Macros contribute by simplifying the notation of the call, improving readability of the client code (“client” in the sense of: being a caller of the subroutines).
All the three advantages I mentioned above are achieved:
- Nobody would deny that the final version alv5_fieldcat_create is better readable than the original.
- The implementation details (here the “boring” population of an internal table) has been extracted from the code.
- Instead of several almost-identical code segments, we are now reusing code, and only the variable parts are visible in the top-level code block.
Another examples for illustration: move-corresponding is a great statement for mapping between different structures, if you control the DDIC definition of both source and target structure. At the edge between two different code parts, however, this “move-by-name” rule will not suffice, and an explicit mapping of structure components is required. This produces noisy code (again, a real-life example with very old, but still productive code):
gs_kundendaten-customer = gs_kundenauftragsdaten-partn_numb. gs_kundendaten-title_key = gs_kundenauftragsdaten-title_key. gs_kundendaten-title_p = gs_kundenauftragsdaten-title. gs_kundendaten-firstname = gs_kundenauftragsdaten-name_2. gs_kundendaten-lastname = gs_kundenauftragsdaten-name. * gs_kundendaten-secondname gs_kundendaten-street = gs_kundenauftragsdaten-street. * gs_kundendaten-house_no gs_kundendaten-postl_cod1 = gs_kundenauftragsdaten-postl_code. gs_kundendaten-city = gs_kundenauftragsdaten-city. gs_kundendaten-tel1_numbr = gs_kundenauftragsdaten-telephone. gs_kundendaten-tel2_numbr = gs_kundenauftragsdaten-telephone2. * gs_kundendaten-tel3_numbr gs_kundendaten-fax_number = gs_kundenauftragsdaten-fax_number. * gs_kundendaten-e_mail gs_kundendaten-langu_p = gs_kundenauftragsdaten-langu. * gs_kundendaten-langu_descr gs_kundendaten-country = gs_kundenauftragsdaten-country.
When I had to change another piece of the method which contained that code, I placed this little macro in the classes macro section:
define _move_from_to. &2-&4 = &1-&3. end-of-definition.
With this one-line macro, that code part could be rewritten in a more readable way [sorry for the screenshot, but I see no way to customize “jive” code sections with a monospaced font, as would be natural]:
As was the case in the original code, mistyped column names will be detected at compile-time – which is good. Macro-based code only looks like free text, but is equally strict concerning the syntax as any other ABAP code.
Declarative programming: A button states example
Suppose, there is a number of buttons requested for a new web application. The state of these buttons – whether they are invisible, inactive (i.e. visible but not clickable), or active, has been specified depending on the current transactional mode (create mode, change mode, display mode).
The change request might contain the following little table:
With macros, it is possible to take this table over into the coding as it is written in the spec!
For this, let us create a little dictionary: We need a data type for the fcode (the user command = the button he pressed), for the transactional mode (create, change or display), and for the button state (active, inactive, invisible).
Have in mind that such a dictionary will be used throughout the whole application, it is not defined only for this particular purpose. It is good style to define data types and structured constants for all the discrete variables appearing in the data model of the application.
When using code instead of DDIC for illustration, the types could look like this
types: ty_fcode type c length 4, " User command ty_state type x length 1, " Button state (active, inactive, invisible) ty_mode type n length 1. " Mode (create, change, display)
To define the different values these types may have, we use structured constants. In real life, we would place these constants in a type pool or in a class section. Anyway, the declaration of the possible values of all these data will look like this:
constants: begin of fcode, actualize type ty_fcode value 'ACTU', save type ty_fcode value 'SAVE', undo type ty_fcode value 'UNDO', reject type ty_fcode value 'CANC', print type ty_fcode value 'PRNT', end of fcode, begin of mode, create type ty_mode value '1', change type ty_mode value '2', display type ty_mode value '3', end of mode, begin of state, active type ty_state value 0, inactive type ty_state value 1, invisible type ty_state value 2, end of state.
Let’s say, we define a method which retrieves the button state, depending on transactional mode and button code:
methods get_state importing iv_fcode type ty_fcode iv_mode type ty_mode returning value(ev_state) type ty_state.
Then we can introduce the following macro:
define _set_button_state_per_mode. if iv_fcode eq fcode-&1. case iv_mode. when mode-create. ev_state = state-&2. when mode-change. ev_state = state-&3. when mode-display. ev_state = state-&4. endcase. endif. end-of-definition.
This finally allows us to implement the method simply by copy-pasting the change request!
How is the change behaviour of this code when requirements will change?
There will hardly ever be a new transactional mode. Therefore, the macro definition will almost never be changed. But there may be new buttons in the future, and the given state definitions may be changed in forthcoming versions of the software. Therefore, what is most likely to be changed, is the implementation of the method get_state. In this form, with the ABAP code actually hidden behind the macro, the part that is most likely to be changed, can be changed with maximum ease, readability – and safety. There are no direct values like strings (where a typo would appear at runtime, not at compile-time): the symbols in use – like active or reject – need to be defined in the appropriate structured constant – otherwise the method will not compile.
Since macros do not have an own execution context, they cannot be displayed properly in the debugger. All the statements listed in the macro will be shown and executed as only one statement in the debugger. It is not possible to single-step through the instructions of a macro.
This clearly is a problem – but only for highly complex macros which of course should be avoided. A rule of thumb: If you find out that your macro spans more than three statements, you might consider to extract these lines in a form routine or a private method, leaving only the call of this method in the macro.
Symbols and Literals
Since macros operate on a pre-compile level, they don’t distinguish literals and symbols. The actual parameters can be used for building symbols (like method names or variable names) as well as for character literals – or both at the same time.
Just for illustration, think of the following macro:
define _set_field. ls_fieldcat-&2 = &3. modify ct_fieldcat from ls_fieldcat transporting &2 where fieldname = '&1'. assert sy-subrc eq 0. end-of-definition.
It could be applied like this:
_set_field ABELN : hotspot 'X', scrext_m text-001.
Here, &2 and &3 are replaced by a part of a symbol name and a literal, respectively. When using a subroutine or method instead, the component &2 of a structure would be accessible only by dynamic programming – using a statement like ASSIGN COMPONENT … OF STRUCTURE … TO … – which is more complicated. Also, a typographical error in the macro would be detected immediately by the compiler (if there is no component of the given structure).
The first macro parameter is special. It is passed without quotation marks, but is used inside the macro to build a string (… where name = ‘&1’ ). This works, but is not safe. Somebody could convert your code to lower case. Then the modify statement will fail. To avoid this, one can normalize the case of the passed parameter first:
define _set_field. ls_fieldcat-fieldname = &1. translate ls_fieldcat-fieldname to upper case. ls_fieldcat-&2 = &3. modify ct_fieldcat from ls_fieldcat transporting &2 where fieldname = ls_fieldcat-name. assert sy-subrc eq 0. end-of-definition.
This macro would be case-safe – although it now exceeds three lines of code. Clearly, in this case it is better to pass &1 as a literal:
define _set_field. ls_fieldcat-&2 = &3. modify ct_fieldcat from ls_fieldcat transporting &2 where fieldname = &1. assert sy-subrc eq 0. end-of-definition.
Macros as Building Blocks for Internal DSLs
Recently, my interest for macros was renewed, when I was reading Martin Fowler’s book on Domain Specific Languages (DSL).
What is a DSL? Here is a definition by Martin Fowler:
Domain-specific language (noun): a computer programming language of limited expressiveness focused on a particular domain.
- He intentionally writes “computer language”, not “programming language”. Of course, there is logic and processing behind any DSL. But often, a DSL feels more declarative than imperative. The text processing system TeX and CSS are examples of DSLs with a declarative taste, while SQL, regular expressions or the language of makefiles are DSLs that feel more like a programming language. May this be as it is – at least, all DSLs have language features: They have a grammar, a dictionary and a set of allowed expressions.
- Important is the limited expressiveness of the language. You can’t use a DSL to solve any software problem. This is an important difference to programming languages like ABAP, Java and so on. With Turing complete languages like ABAP, you can solve any problem, if it is algorithmically solvable at all. But you can’t use a DSL to solve any problem. A DSL is designed for one particular problem, or a particular class of problems.
- Being restricted to a particular domain, a good DSL requires a syntax which makes it as readable as possible for the expert of that particular domain. Keywords and statement structure should be chosen in such a way that the code displays the intention with maximum readability, minimum code noise. In the examples above (CSS, makefiles and regular expressions), developers are the users. In a software for business applications, the code should be business-readable.
Martin Fowler describes two categories of domain-specific languages:
- The first category, the external DSLs, are defined and parsed outside of the host language, thus requiring an own parser.
- On the other hand, the internal DSLs are in fact fragments of code written in the host language (under certain rules). They are parsed, compiled and executed like any other part of the host language. There is no separate parser necessary.
computer() .processor() .cores(2) .i386() .disk() .size(150) .disk() .size(75) .speed(7200) .sata() .end();
Due to the language restrictions, there still is some syntactic noise in this code: The brackets, the dots, and the computer() and end() statement are not part of the code intention. But everybody understands directly that this text is about specifying the parts of a computer.
How would one design that part in ABAP? Using macros, we could write:
processor : cores 2, type i386. disk 1 : size 150. disk 2 : size 75, speed 7200, type sata.
We will see in a minute how to implement the underlying macros.
For the moment, I just want to say that I find this version more expressive, more to-the-point than the Java example with method chaining.
Only the ‘1’ and ‘2’, the numbering of the disks, is a small technical debt (why couldn’t the system increment such a counter behind the scenes?). This is necessary because otherwise we couldn’t detect when the configuration of the first is finished, and the second disk begins. But we could sell this as a feature: Instead of ‘1’ and ‘2’, we could e.g. use this parameter to assign names to the different disks…
Everything in this code is under syntax control: If you typed ‘xata’ instead of ‘sata’, and xata is not a supported disk type, the compiler will send a syntax error.
Structured Constants as Dictionary
The first technique for implementing such a DSL is using structured constants. Suppose we want to specify a disk type from a given list of disk types, using a macro “disk_type”. So in the DSL we just want to type
Then an appropriate field of the semantic model, let’s say a component of a structure ls_disk, should be filled with the disk type, here ‘SAS’. Also, it should be impossible to type others than the disk types known to the system.
In order to specify the known disk types, we use a structured constant:
begin of gc_disk_type, sata type string value 'SATA', sas type string value 'SAS', scsi type string value 'SCSI', end of gc_disk_type.
Now, we can define the macro (a one-liner, as usual):
define disk_type. ls_disk->type = gc_disk_type-&1. end-of-definition.
We proceed similarly with a macro processor_type for the different processor types (i386, i686, PPC,…).
The advantage of the structured constant is that it can be used elsewhere in the code, too, as a replacement of a string literal by a symbolic name (which is good practice anyway).
There is another useful technique in this context. I would like to call it name composition. The idea is to use one of the macro arguments to build a known symbolic name – e.g. of another macro, of a method, subroutine, function module.
In the example above, the instructions for the processor
processor : type i386, cores 2.
are using the colon / comma notation, which fits nicely – the colon has an explanatory meaning, in the sense: “The following parts specify in detail what is preceded by me.” That is exactly what is intended here.
Technically, however, this expands into two calls of a macro named ‘processor’:
processor type i386. processor cores 2.
We will now use the second macro argument for building the name of a specific macro that has to be employed instead. This is name composition:
define processor. processor_&1 &2. end-of-definition.
Now, we can add macros for all the things we want to specify for the processor (which is designed as a component of the object eo_processor here):
define processor_type. eo_computer->gs_processor-type = gc_processor_type-&1. end-of-definition. define processor_cores. eo_computer->gs_processor-cores = &1. end-of-definition.
This works similarly for the disk specification. The only point here is that there are several disks. Therefore, we need the name argument (the second argument) of the macro, to get the right table line, before the data can be called.
If we have a local reference variable ls_disk, pointing to a disk structure, the macro “disk” is written as follows:
define disk. if ls_disk is not bound or ls_disk->name ne &1. ls_disk = eo_computer->get_disk( &1 ). endif. disk_&2 &3. end-of-definition.
Here, a method is called first to position or generate the correct entry in the table of disks:
method get_disk. data: ls_disk type ty_disk. read table gt_disks reference into es_disk with table key name = iv_name. if sy-subrc ne 0. ls_disk = get_default_disk( ). ls_disk-name = iv_name. insert ls_disk into table gt_disks reference into es_disk. endif. endmethod. "get_disk
The following form shows how the internal DSL is embedded into the ABAP code: It returns an instance eo_computer which carries the semantic model. Such a routine (or method, or function) can be called by other programs in order to get the data of the configured PC.
The DSL section is marked with special comments: The part enclosed by these comments could be edited (and changed) separate from the rest of the program, if an appropriate editor tool is used. In a forthcoming blog, I plan to explain such an internal DSL editor tool in more detail.
form get_computer changing eo_computer type ref to lcl_computer. data: ls_disk type ref to lcl_computer=>ty_disk. create object eo_computer. * <<< BEGIN DSL processor : type i386, cores 2. disk 1 : size 2000, speed 750, type sata. disk 2 : type sas. * <<< END DSL endform.
The full source code of this internal DSL example can be inspected on my pastebin.
Order of the parameters
If a macro is called several times, there may be parameters whose values vary frequently, and others varying only rarely. The macro should be designed with the frequently-changing parameters at the end of its parameter list. This, again, is for being able to exploit the colon/comma mechanism.
Observe my choice of the parameter order in the _move_from_to example above. Only in the order chosen above ‘&2-&4 = &1-&3’, I can arrange the macro’s actual parameter in such a ways that the move process is nicely visible. This shows that the order of the macro arguments is important. Sometimes, a macro with the wrong parameter order makes the code worse than it was before.
For macro names, the same considerations are valid as for any other symbolic name in programming: They should be chosen in such a way that the code which uses them becomes as readable as possible. This is particularly important in the scope of a DSL.
Prefixes: In my opinion, it is useful to mark a non-DSL macro (which is only used for simplifying, saving and clarifying ABAP code) with a special prefix. I use an underscore. The purpose of such a prefix is to distinguish the macro on the first view from an ABAP language keyword.
For macros used in DSLs, however, a prefix would be inappropriate, as the DSL idea is to separate away the “technical level”.
Macros are a code reuse technique which, like all the other reuse techniques, have the three advantages of improving code readability, simplifying code, and removing code duplications.
Macros come out to be particularly useful as building blocks of internal DSLs.
Of course, there always is the possibility of abuse, in particular when the macro contains too many lines. But this is no reason to forbid them in the programming guidelines. As already the Romans said: Abusus non tollit usum.
Martin Fowler with Rebecca Parsons: Domain-Specific Languages. Addison-Wesley 2010.
Debasish Ghosh: DSL – grow your syntax on top of a clean semantic model. Blog, from 2/22/2010
Complete ABAP code for the PC configuration example: http://pastebin.com/6gqZdvQJ