Code Inspector’s Performance Checks (III)

former_member184455 · ‎01-21-2009

In the first two articles of this series we had a look at how to best read data from database tables. The Code Inspector checks whether the database access has a WHERE clause that will be able to make use of a database index, or if an access to a buffered table will implicitly bypass the SAP table buffers.

Articles of this series:

Code Inspector’s Performance Checks (I)
Code Inspector’s Performance Checks (II)
Low performance operations on internal tables (discussed in this article)
Code Inspector’s Performance Checks (IIII)

Now that you have read data from the database to the application server efficiently, it is essential that you streamline the way in which you deal with the data inside your program.
One big threat to the scalability of a program is large internal tables that are accessed sequentially (see Runtimes of Reads and Loops on Internal Tables). Sequential access means, that each individual entry of the internal table is accessed by the ABAP runtime in a single step loop - either until a certain key value is found, or until the end of the internal table has been reached. This implies that a sequential access can be fast - but only if the entry searched for is at the beginning of the search area. However, on average, half of the internal table has to be stepped through to find one entry with sequential search.
In contrast, optimized accesses are non-sequential and apply indexes, binary or hashed key searches. So, while the sequential access scales linearly with the amount of processed data, optimized accesses scale logarithmic or even better (i.e. constant access time independent of amount of data).

Moreover: once you start nesting internal tables in your program, for example a statement READ TABLE itab2 inside a LOOP AT itab1 ... ENDLOOP, you can get caught out by quadratic run time behavior. This will happen if the size of the two internal tables depends linearly on the amount of data processed in your program, and if the access to the inner table is not optimized.

This article presents a Code Inspector check that detects sequential (or 'non-optimized') accesses to internal tables.

Types of internal tables and accesses

In ABAP, internal tables can be specified according to the following table categories:

STANDARD table
SORTED table
HASHED table

Additionally there are the generic table categories ANY TABLE and INDEX TABLE.

SORTED and HASHED tables have a table key that speeds up the table access if the key is fully specified. In the case of the SORTED table, the access can also be optimized if just the leading part of the key is specified in the READ or LOOP statement.
Fast accesses to STANDARD and SORTED tables are possible by using the table index, that is, the position of an entry in the table.
A fast access to a STANDARD table can also be achieved by using the option BINARY SEARCH in the READ accesses. To return correct results, the table must be sorted appropriately.

Sequential, 'non-optimized' accesses

The following accesses to internal tables may result in a sequential search:

READ TABLE itab
- READ TABLE itab WITH KEY
  For a STANDARD table, this leads to a sequential access if the option ‘BINARY SEARCH' is missing.
  For HASHED tables you get a sequential access if the table key is not fully specified, and for SORTED tables if the leading part of the key is not specified.
- The two variants READ TABLE itab WITH TABLE KEY and READ TABLE itab FROM wa result in sequential accesses for STANDARD tables.
- The variant READ TABLE itab INDEX idx is always optimized (and only possible for the index table types STANDARD and SORTED).
LOOP AT itab WHERE ...
This always leads to a sequential access for STANDARD tables.
For HASHED and SORTED tables to be able to make use of the table keys, the WHERE clause must only contain ‘AND' conditions, and all fields must be specified with ‘EQ' or ‘='. Again, a SORTED table can make use of the leading part of the key, while for a key access to a HASHED table all key fields must be provided.
MODIFY and DELETE
Both statements have three basic variants:
- MODIFY/DELETE TABLE itab FROM wa
  This is always a sequential access for STANDARD tables.
- MODIFY/DELETE itab ... WHERE ...
  This variant behaves like the LOOP AT itab WHERE ... (see above)
- MODIFY/DELETE itab INDEX idx ...
  This access type is always optimized (and only possible for the index table types STANDARD and SORTED)
INSERT
The statement INSERT wa INTO itab INDEX idx scales linearly for STANDARD tables up to a certain size, because a linear index has to be maintained. At some threshold value a B*-tree is created which leads to logarithmic scaling behavior for larger tables.

Details of the Code Inspector check

The check ‘Low Performance Operations on Internal Tables' has four attributes corresponding to the different table types that can be checked: STANDARD, SORTED, HASHED, and GENERICALLY typed tables.
If you define an IMPORTING parameter for a method as type ‘ANY TABLE' (that is, generically typed), the check tool will not know which non-generic table type this will be at runtime. It thus assumes that it will be a STANDARD table and accesses to the table are checked accordingly.

Note that in the global check variant ‘DEFAULT' of the Code Inspector the option to check STANDARD tables is normally de-selected.

The check itself is based on the ABAP compiler (class CL_ABAP_COMPILER) and its services. The compiler provides information like the type of an internal table that cannot be extracted easily by doing a simple source code scan. Here, the information as to whether the access to an internal table can be optimized or not is provided directly to the check framework.

The check detects non-optimized accesses to internal tables with the following statements (see above):

READ TABLE itab
LOOP AT itab WHERE ...
LOOP without WHERE clause does not raise a message
MODIFY / DELETE

In the ‘Remarks' section below you will find some additional internal table statements that may be slow, but that will not give you a message with this check.

The possible Code Inspector messages for this check are:

Message	Default priority
Sequential Read Access for a Standard Table	Information
Possible Sequential Read Access for a Sorted Table	Warning
Possible Sequential Read Access for a Hashed Table	Warning
Possible Sequential Read Access for a Generically Typed Table	Warning
Possible sequential access during deletion from a table	Warning

The last message is only relevant with Release SAP NetWeaver 7.1 and higher. It comes into play when secondary table keys are defined for an internal table. Here, the deletion of entries in a STANDARD table via a SORTED or HASHED secondary key can lead to linear runtime behavior.

How to proceed with a Code Inspector message

The static check tool has no clue about how many entries an internal table will have at runtime. Therefore, not all check messages will have the same relevance for the performance of the program execution.
Now, also you as the developer are unlikely to know exactly how many entries an internal table will have in a productively used customer system. To make things easy we propose that you only distinguish between ‘small' internal tables with a maximum of 20-30 entries on the one side, and ‘large' tables on the other side. All frequently executed accesses to large internal tables should be optimized, that is, use an index, or use a table key, or do a BINARY SEARCH. In many cases, a SORTED table will do the job (* see comment at end).

This is how you should advance in detail if there is a sequential access to an internal table:

As with all Code Inspector messages, first check if the code will be productively used at all. If it's only a test report there may be no need for optimization.
If the internal table that is accessed sequentially will always be small (up to 20-30 entries) at runtime, it is normally not necessary to provide an optimized access - so there is no need to define such a small table as a SORTED or HASHED table. This is also why a sequential read access to a STANDARD table only raises an ‘information' message in the check tool.
If it's an access to a STANDARD table, and the table might become large at runtime:
- Think about converting the table type to a SORTED table. Only if your data set is very large and has the character of a mapping table (unique keys), a HASHED table can also be appropriate.
- A READ TABLE on a STANDARD table can be made faster with the option BINARY SEARCH. Also fast nested LOOP processing is possible with sorted STANDARD tables and BINARY SEARCH (see Runtimes of Reads and Loops on Internal Tables). But the table must be sorted and/or kept sorted in the appropriate order. Note that a SORT is an expensive operation.
- In newer releases of SAP NetWeaver, a secondary table key can be created. Keep in mind that this additional key will contribute to resource consumption and can only be justified by a corresponding number of read accesses.
If it's an access to a SORTED or HASHED table (that is, a table that might become large at runtime):
- If the WHERE clause in a LOOP statement cannot make use of the table key, because it contains OR conditions (disjunctions) or comparison operators other than ‘EQ' / ‘=', think about re-defining the fields in the WHERE clause.
- If the table key cannot be used for a HASHED table because not all fields were available, think about switching to a SORTED table. These can make use of left-justified parts of the table key to optimize the access.
- If there are fields missing so that the full (or left-justified part of the) table key cannot be provided, think about the fields in the table key and their order. Maybe another order or other fields would better support your access. Please be aware that changing the fields in the key of a SORTED or HASHED table to improve one access can have negative impacts on many other accesses - and even lead to syntax errors. You need a ‘holistic' approach that takes all accesses to the internal table into account.
- In newer releases of SAP NetWeaver, a secondary table key can be created. Keep in mind that this additional key will contribute to resource consumption and can only be justified by a corresponding number of read accesses.

Remarks

There are further potentially slow operations on internal tables that are not reported by the check, but that can be observed in performance measurements:

LOOP AT ... without WHERE clause
If you do not apply a WHERE clause, it is expected that all table entries must be read. This may be slow, of course, for large tables. And sometimes there are CHECK statements or the like inside the LOOP ... ENDLOOP which show that in fact not all entries were needed for the processing.
LOOP AT ... with dynamic WHERE clause.
For dynamic accesses, the check tool cannot decide whether they will be optimized at runtime or not. Therefore no message is issued.
READ TABLE with dynamic key components
READ TABLE itab INTO wa WITH [TABLE] KEY (comp1) = ... (comp2) = ...
The check tool will issue a message for the WITH KEY variant of this statement, though it may be optimized at runtime.
On the other hand, the WITH TABLE KEY variant will not lead to a message for SORTED and HASHED tables. If, at runtime, the dynamically defined key differs from the table key, you will get a short dump.
SORT itab
Sorting STANDARD tables (for example to provide the correct sort order for the READ TABLE ... BINARY SEARCH) scales worse than linear with the table size, so it is a rather expensive operation. To justify the cost of one SORT operation, at least 30 READ TABLE ... BINARY SEARCH have to follow.
Table index build up
If a STANDARD table is filled with INSERT wa INTO itab INDEX idx, internally a new type of index will be build up when the number of entries reaches a certain threshold.
Also, the new secondary table keys are - if they are non-unique - only build up on their first use.
This ‘lazy' or ‘on-demand' behavior can lead to unexpected runtime delays.
Sometimes one finds the statements MODIFY/DELETE itab FROM wa inside a LOOP ... ENDLOOP over the same table. If itab is a SORTED or HASHED table, the MODIFY/DELETE gives no check message, since the access can be optimized; for a STANDARD table it's just an ‘information' message. But the MODIFY/DELETE statements start a new internal search for the entry given by the work area wa, though often it's already known - it is the current entry processed in the LOOP.
Therefore, for index table types (STANDARD and SORTED tables), the statement MODIFY/DELETE itab INDEX idx ... should be used inside a LOOP, where idx is the current LOOP index. The addition INDEX idx can be omitted, since it is added implicitly in a LOOP.

(*) We do not discuss here cases were you need alternative access paths to large internal tables. The situation for such a scenario will improve in future with the new secondary table keys, but is rather complex for older releases.