Analyzing Problems Using ABAP Short Dumps: Part I...

Former Member · ‎11-18-2009

The first part of this weblog did not quite manage to open a short dump as of Release NW04s 7.0 EHP1 for display. Instead it reviewed ways to extract contextual information from the short dump lists and elsewhere.

In this second part of the web log we, in the words of W. C. Fields, grab the bull by the tail and face the issue. In a short dump, you want to answer these primary questions:

What exactly happened?

Where did it happen?

How can the problem be corrected?

We will look at the diagnostic information and aids that the short dump offers for answering these questions.

From the Top: The Context of the Error

You have finally made it into one of the ABAP short dumps. You'll see a display that looks quite similar to this one.

Maybe the Short text , What happened, Error Analysis, and Source Code Extract will be enough to let you diagnose and correct the problem. That's often the case when a dump was caused by a relatively stupid programming error. But let's take it from the top and see what diagnostic help the short dump offers, just in case.

The Short Dump Heading

On a bold red background at the top of every dump you will find the short dump ID and the date and time at which the dump occurred. Together with the Application Server and the WP Index from the dump list, you have all the information that you need to look for relevant messages in the ABAP System Log or Developer's Trace (see Part I of this weblog).

If an exception occurred and the runtime error was cacheable, the exception that caused the dump is also shown.

Together with the program name (ZSP_COMPLETE_FQDNS) from What happened?, you also already have enough information to search for a relevant OSS note. The combination of dump ID or exception name and program name should find the right note, if it exists. You'll also find a more extensive list of search terms for the OSS in the How to correct the error section in the dump.

The System Environment: Context Information and Where Did That RFC Actually Come From?

You probably skip over the context information presented under System Environment. But there are some worthwhile nuggets of information in there.

If you plan to search for OSS notes and messages, then you will need the system release and SP levels, kernel patch level, and other facts on the ‘scene of the crime' to see whether notes or messages fit your problem. If you plan to open an OSS message for SAP, then you can simply save and attach the entire short dump. (From the dump display in ST22, choose System -> List -> Save -> Local file.) That should help Support to respond quickly to the problem.

If you are analyzing the problem yourself, then here are three important bits of information:

At the bottom of the System environment list, you'll find a compact overview of the memory usage of the program at the time that it dumped. If you see that the program has allocated heap memory, then check to see in section Information on where terminated to see if the program was started in a background job. If the program was not running as a background job, then you might want to take a look at the memory consumption of the program in the Debugger with the Memory Analyzer or with the ABAP Runtime Analysis (transaction SAT). A dialog program - one running interactively in a dialog process - gets heap memory - private, process-local memory - only if the memory resources of the Web AS have been exhausted. (Just to confuse things, background jobs manage memory differently, and get heap memory before getting ABAP Extended Memory.) If you see a dialog program with heap memory, then something is wrong with the program. Or the memory resources configured in the Web AS are inadequate. Or possibly other processes are memory hogs and have forced this process into heap memory.

Check the User and Transaction section to see if the dump occurred while processing a dynpro screen. The User and Transaction list specifies the screen and ‘Screen Line' at which the dump occurred. Screen Line is actually the line in the flow logic of the dynpro at which the faulty module was called. You'll also get this information out of the Source Code Extract as well, but here, you won't have to piece together the information on which module in which dynpro in which program failed.

If you are dealing with an RFC problem in the RFC server, then the Server-Side Connection Information tells you where the RFC call came from. You can then find the short dump on the caller side, which may help you to understand the server-side dump.

And of course, the opposite is true. From a dump on the client side of an RFC interaction, you can find out where the call went.

What Happened Exactly: Short Text, What Happened, Error Analysis

The key question is: what happened exactly? You need to understand the problem in detail to be able to correct it. For this understanding, the Short text, What happened, and Error analysis sections are invaluable.

The Short text states what happened in a single line. In our MOVE_TO_LIT_NOTALLOWED_NODATA dump (above), the short text is this:

Interesting - so I'm trying to overwrite a constant in my program?

The What happened section of this dump adds the name of the program in which the error occurred.

In the screen shot above, the identification of the faulty program is quite simple, since I was too lazy to write a faulty method that perhaps resided in a separate include. But you may see more complicated explanations of the location of the error like this:

The termination occurred in the ABAP program "SAPLSVIM" in "VIM_BC_LANGU_ADD". The main program was SAPLS_IMG_TOOL_5. The termination occurred in line 330 of the source code of program LSVIMF59. In this case, SAPLSVIM is the current program, in which the dump occurred, and the SAPL prefix indicates that we are actually talking about the function group SVIM. VIM_BC_LANGU_ADD is the processing block (function module, method, form routine) in which the dump occurred. LSVIMF59 is the name of the include in which VIM_BC_LANGU_ADD is located.

Or, if the crash occurred in a class, you might see something like this: The current ABAP program "CL_IM_SPROXY_BADI_CTS=========CP" had to be terminated because ..., where CL_IM_SPROXY_BADI_CTS=========CP is the name of the class pool in which the dump occurred. You can enter the full name in SE80 or SE24 to display the class, but more commonly, you would simply enter CL_IM_SPROXY_BADI_CTS...

In many short dumps with few possible causes, the What happened section describes the error that occurred quite exactly. The MOVE_TO_LIT_NOTALLOWED_NODATA dump, however, can arise out of many different circumstances. It's not possible to say which transgression in the code produced the short dump, so detailed explanations are forced from What happened into the next very useful section, the Error Analysis.

Short dump texts are written by SAP kernel developers. The Error analysis sections often provide really detailed information about the possible causes of a short dump, which in turn reflects the detailed knowledge of the kernel that these developers have. There is often a lot of text, but the time taken to read it through will be rewarded.

For example, we learned from the short text that my program dumped because it tried to overwrite a constant. I don't see any constant in the bad code below. I just wanted to complete the fully-qualified domain names of a list of hosts. Do you see the error in the code?

If you don't see where I try to overwrite a read-only field, then see the seventh point in the discussion in Error analysis, the one that begins "Accesses using field symbols..." Experience has shown that a lot of people just skip over the explanations in What happened and Error analysis. This may end up costing them more time than it saves.

Where Did It Happen: Source Code Extract

The SAP Short Dump developers were right to put Source Code Extract in initial caps, because, if you are lucky, this is a really nice, helpful section of the dump. You're shown exactly where the program was aborted. A few people don't know that from here, you can jump right into source code in the ABAP Editor with a double-click. In the dump that we have been following, it would be possible in the editor to branch to the definition of the internal table LT_CSMNSDIC, where you might notice that the CDNSNAME field has been declared as part of the key of the sorted internal table...

If you can reproduce the problem, then you can set a breakpoint right from the short dump in order to stop just before the short dump occurs. You can then use all of the tools of the new ABAP debugger to investigate the cause of the dump.

If the code line shown by the pointer doesn't seem to make any sense in the context of the dump, then take a look at the previous line of code. Occasionally, the instruction counter may still advance even after a dump has been triggered, so that the >>>>>> pointer points at the line following the bad line of code.

Where Did It Happen: Active Calls/Events

In program failures that involve infrastructure like Web Dynpro, or calls between components, or in which an uncaught exception has been passed up through the callers, the Active Calls/Events section may help you to understand the components involved in the crash. This call stack is a useful supplement to the point of failure marked in the Source Code Extract, because in the stack you can see how you got to the point of failure.

You read the Active Calls/Events list from the bottom up. It shows all of the report events, dynpro modules, functions, methods and form routines through which the path of execution has come. You can jump into the ABAP Editor at any level in the call stack. This means that you can set breakpoints all along the way to the dump if you think that a problem at a higher level resulted in the dump at the end of the stack.

There are two things to remember about the ABAP call stack:

It's a call stack and not a complete history of calls. If the flow of execution returns from the last callee in the stack, that return from the callee is not shown in the stack. If the short dump occurs in the caller, then you might wonder why the stack shows a different program as the end point of execution than the What happened section.

If ABAP dumped because of an incompatible call to a function module or method (CALL_FUNCTION_CONFLICT_GEN_TYP, CALL_FUNCTION_CONFLICT_LENG, CALL_METHOD_CONFLICT_TYPE, ...), then the called function or method will not appear as the last level in the call stack. The call itself failed, so the callee is not shown in the stack.

Where Did It Happen: The Hard Way

Usually, the Source Code Extract shows where your error occurred. But if you are unlucky, you may have to determine this vital piece of information the hard way. As a not so tragic example, if a short dump occurs in a macro, then the source code pointer will be set to the macro call, not to the statement in the macro that caused the problem.

An error in the kernel may leave no information in the Source Code Extract at all.

In cases like these, how can you find out where the short dump occurred?

Let's start with the no-source-code-its-a-macro case. The Source Code Extract does show where the misbehaving macro was called. Since you can jump into the ABAP Editor and then forward-navigate into the macro with a couple of clicks, you can first see if a good look at the macro code might reveal the problem.

If you still can't see where in the source code the problem occurred, then the ABAP Control Blocks (CONT) section may help you to localize the problem. The CONT table shows the CCBs - Control-Control-Blocks - which represent the ABAP statements to be executed in the processing blocks of an ABAP program. The short dump contains an extract of the CONT table showing the CCBs that lead up to the dump and the next few statements that were to be processed. Read the list of CCBs from the top down.

Low-level as it is, the CONT does not care whether statements are in a macro or not - and it shows the short dump pointer that you know from the Source Code Extract. Unfortunately, a double-click on the CCB at the dump pointer still takes you only to point in the source code at which the bad macro was called. But the halfway intelligible CCB names may be enough to show you at which line of code in the macro the problem occurred.

First of all, if the macro is not too long, then clicking on the CCBs to jump into the ABAP Editor shows you where the macro started. Then, with a little jumping back and forth between the CONT table and the ABAP Editor, you can start to equate the CCBs and the statements in the faulty code.

In our case, the SQLS and PAR1 CCBs turn out to reference an SQL SELECT well before the macro call. CCB 68, BRAF, represents the start of an IF control structure in which the macro is called. The COND and PAR1 CCBs depict the macro statement that actually failed: CONCATENATE &1 ‘.sap.corp' into &1.

In the case I-have-only-a-kernel-dump (SYSTEM_CORE_DUMPED, ABAP_ASSERT, etc.), the Source Code Extract section will really be empty. In this case, the dump section Active Calls in SAP Kernel provides clues as to the location of the error. But since no customer or ABAP application developer should have to read a kernel stack, we mention this only for the record. If you have a short dump that originated in the kernel and it is not simply because somebody pulled the plug on the ABAP AS, then all you need to do is provide the short dump with the Active Calls in SAP Kernel section to SAP Support.

Other situations with no where-it-happened location: Should you not have any luck in finding out exactly where the program went down the tubes, then a useful tip is to try to reproduce the problem in transaction SAT, the ABAP Runtime Analysis. In SAT, you can trace the execution of an ABAP program at the level of ABAP processing blocks. Run your program to its dump (provided that this does not take too long - a non-aggregated SAT trace can get large quickly). Then check the SAT trace. It may help you find out pretty exactly where to look for the problem, even if the dump occurred in a macro.

Also, you can use ST05, the Performance Analysis, to switch on (in a controlled fashion - for your user, for example) a detailed trace of program activity. Be aware that the trace will also include the writing of the short dump. The dump processing starts where you find activity on DB table SNAP, so search for the problem area before that point.

See help.sap.com for help with using SAT and ST05.

The Third Major Question: What's the Solution?

Naturally, the discussion that you will find in the How to correct the error section of a short dump tends to be a bit generic. Developers are constantly finding new and inventive ways to repeat old errors, like the MOVE_TO_LIT_NOTALLOWED_NODATA error that we have been examining. It's therefore not possible for How to correct the error to describe exactly what you should do to fix a dumping program.

Even so, the combination of the discussion in How to correct the error and taking a good look at the faulty code often leads to success in correcting the problem. In the case of the MOVE_TO_LIT_NOTALLOWED_WA dump that we have been examining, the dump astutely remarks that ‘The field to be overwritten is a parameter or a field symbol.' If you were not aware that the sort keys of a sorted table may not be overwritten in a field symbol, then the tip that a field symbol may be involved might help you get onto the right analytical track.

In the end, however, understanding and correcting the cause of a short dump rests on your shoulders. You will have to extract as much information from the short dump as possible, and use this information to illuminate what went wrong in the code.

Gathering More Information

A short dump addresses more or less directly the journalistic questions of what went wrong where and what to do. Should these questions be addressed ‘less' rather than ‘more' in a dump, then it is good to know that a dump also includes a lot of additional supporting information that can help you in your analysis.

System Variables

As an ABAP program executes, it is accompanied by an entire swarm of system variables, like Jupiter with its cloud of little moons. Some of these variables are well-known, like SY-SUBRC, the return code set by many ABAP instructions or SY-TABIX, the counter in LOOP AT and READ TABLE internal table instructions.

When a short dump occurs, ABAP preserves the state of the system variables at the time of the crash. You can see the contents of these variables in the Contents of system fields section. Here are some of the system variables that are most likely to be useful:

SY-SUBRC usually shows the last return code setting before the program crashed. A non-zero SY-SUBRC from a method or function preceding an instruction that dumped may illuminate for you what went wrong.

SY-TABIX. In a short dump raised from within a LOOP AT table or after a READ TABLE instruction, SY-TABIX tells you what record from the internal table was being processed when the program failed.

SY-INDEX provides the same iteration-count information for DO and WHILE loops.

SY-LINNO (number of lines in an ABAP list) and SY-COLNO (number of columns in an ABAP list) show how much memory a large ABAP list consumes, if you are having memory problems with a large list.

SY-MSGID and SY-MSGNO, if set, let you look up the last message issued by the failed program in transaction SE91. SY-MSGV1 - 4 show any message variables that previously were set (not necessarily for use in the most recent message).

SY-DATUM and SY-UZEIT may show a more accurate and earlier time stamp for the initial program abort than the date and time associated with the short dump itself. If you are sifting through the System Log or Developer Traces (see the Part I of this weblog), then the few seconds difference that you may see can be important in establishing to chronology of events in a failure.

Program Variables

For the Chosen variables section, the short dump infrastructure takes a quick run through the collapsing program context grabbing any program and infrastructure variables it finds that are currently in scope. The situation is a bit like the belated shopper running through a grocery just at closing time - there's no guarantee that the shopper will bring home everything that he or she was supposed to buy. Even though the dump infrastructure may not capture everything, much more often than not you will find the variables and values that you want to see.

Since SAP_BASIS Release 6.20, the short dump infrastructure has captured a separate set of Chosen variables for each level in the Active Events/Calls ABAP call stack.

If you are analyzing a data-related problem, then a careful look at the Chosen variables may clarify the problem. In one recent example, an OSS message reported a short dump because ABAP could not convert the character value 229812 to an integer (dump ID CONVT_NO_NUMBER). Since this is one of ABAP's easiest tricks, the dump is at first glance pretty mystifying. A quick look at the character field in Chosen variables showed, however, that the character field held not ‘229812' but rather ‘229812##믆䀾##蠤䋒##p###'. The fact that the field was either not correctly initialized or was filled with non-character data explains the conversion failure, at the very least.

Chosen variables shows the size (here, one record with a length of 3440 bytes) of an internal table, as well as useful information such as the type of organization of the table (here, a sorted table). The table display can be useful in analyzing the popular dump of type TSV_TNEW_PAGE_ALLOC_FAILED (no more memory available for an internal table), since you can see how much memory has been allocated to hold the rows of each internal table. (The amount of storage allocated for the rows may not, however, be the amount of storage used by the rows of the table. If, for example, a table holds only data references to objects, then storage for references may not be all the memory actually consumed by the table and its contents. The references are relatively short. The objects may occupy much larger amounts of memory.)

In an upcoming release, the table display will contain at least the start of the contents of each of the first five records of each internal table that is captured.

Finally, object references that have not been initialized (a favorite cause of OBJECTS_OBJREF_NOT_ASSIGNED_NO, and others...) are easy to pick out in Chosen variables. Just use Ctrl - F to search for ‘:initial}'.

Note that a random mouse click in the Chosen variables display switches the display from the relatively attractive formatted view to an unformatted view. Don't be alarmed. Just click on F3 / Back to return to the formatted display.

An Ounce of Prevention...

Is worth a pound of cure, as the old saying goes.

Don't forget that ABAP offers logging and checkpoints that can be activated when needed (see help.sap.com). With these, you can turn on switchable logging, breakpoints, and assertions to help you with diagnosis and trouble-shooting, should something go wrong in your program after it has reached your users.

And don't forget the suite of tools that the ABAP Workbench offers to help you find errors before your users do, starting with tools for static checking like the Code Inspector (Transaction SCI), continuing with the ABAP Unit Test facility, with which you can even go so far as to practice test-driver development. The best ABAP short dump is the one that you never have to analyze.

- This weblog is based in part on Boris Gebhardt's Advanced ABAP Workshop: ABAP Analysis Tools. You can find more information on ABAP Test and Analysis Tools at help.sap.com and also in ABAP: Advanced Tools and Techniques, Volume 2, SAP Press 2009, ISBN 978-3-8362-1151-2.