Usually when I blog on SCN I write about some specific development problem and the solution I found for it. In contrast this blog is about a more abstract topic, namely how to efficiently debug code. While it is quite easy to debug SAP code (the business suite is open source after all, at least the applications written in ABAP) debugging a certain problem efficiently is sometimes quite complex. As a result I’ve seen even seasoned developers getting lost in the debugger, pondering over an issue for hours or days without being close to a solution. In my opinion there are different reasons for this. One, however, is that some special approaches or practices are necessary in order to find the root cause of complex bugs using debugging.
In this blog I try to describe the approaches that are from my experiences successful. However, I’d also be interested which approaches you use and what your experiences are. Therefore I’m looking forward to some interesting comments.
Setting the scene
First I’d like to define what I would classify as complex bugs. In my opinion there are basically two categories of bugs. The simple ones and the complex ones 😉 . Simple bugs are all the bugs that you would be able to find and fix with a single debugger run or even by simply looking at the code snippet. For example, copy and past errors or missing checks of boundary conditions fall in this category. By simply executing the code once in the debugger every developer is usually able to immediately spot and correct these bugs.
The complex ones are the once that occur in the interaction of complex frameworks or APIs. In the SAP context these frameworks or APIs are usually very sparsely documented (if documentation is available at all). Furthermore, in most cases the actual behaviour of the system is influenced not only by the program code but also by several customizing tables. In this context identifying the root cause of a bug can become quite complex. Everyone that has every tried to e.g. debug the transaction BP and the underlying function modules (which I believe were the inspiration for the Geek & Poke comic below) or even better a contract replication form ERP to CRM knows what I’m talking about. The approaches I will be discussion in the remainder of this blog are the ones I use to debug in those complex scenarios.
Know your tools
As said in the introduction I want to focus on the general approach for debugging in this blog. Nevertheless, an important prerequisite for successful debugging is knowing the available tools. In order to get to know the tools you need to do two things. First, its important to keep up to date with new features. In the context of ABAP development SCN is a great resource to do so. For example, Olga Dolinskaja wrote several excellent blogs regarding new features in the ABAP debugger (cf. New ABAP Debugger Tips and Tricks, News in ABAP Debugger Breakpoints and Watchpoints , Statement Debugging or News in ABAP External Debugging – Request-based Debugging of HTTP and RFC requests). Also Stephen Pfeiffers blog on ABAP Debugger Scripting: Basics or Jerry Wangs blog Six kinds of debugging tips to find the source code where the message is raised are great resources to learn more about the different features of the tools. Besides the debugger also tools like checkpoint groups (Checkgroups – ABAP Development – SCN Wiki) or the ABAP Test Cockpit (Getting Started with the ABAP Test Cockpit for Developers by Christopher Kaestner) can be very useful tools to identify the root cause of problems. However, reading about new features and tools is not enough. In my opinion it is important to once in a while take some time to play with the new features you discovered. Only if you tried a feature in toy scenario and understood what is able to do and what now will you be able to use the feature in order to track down a complex bug in a productive scenario.
Besides the development tools there are other important tools you should be able to use. Recently I adopted the habit to reply to questions by colleague whether I knew what the cause of a certain bug could be if they already performed a search on SCN and in the SAP support portal. In a lot of cases the answer is no. However, in my opinion searching for hints in SCN and the SAP support portal should be the first step whenever you encounter a complex bug. Although SAP software is highly customizable and probably no two installations are the same those searches usually result in valuable information. Even if you won’t find the complete solution you will at least get information in which areas the cause of the bug might be. And last, but not least, also an internet search usually turns up some interesting links.
Thinking about the problem…
The starting point for each debugging session is usually an error ticket. Most likely these tickets was created by a tester or a user that encountered an unexpected behaviour. Alternatively the unexpected behaviour could also be encountered by the developer during developer testing (be it automated or manual). In the first case the next step is normally to reproduce the error in the QA system. Once a developer is able to reproduce the error it is usually quite easy to identify the code that causes an error message or an exception (using the tools described in the previous chapter). If no error message or exception but rather an unexpected result is produced identifying the starting point for debugging can already become quite challenging.
In both cases I recently adopted the habit to not start up the debugger immediately. Instead I start by reasoning about the problem. In general I start this process of by asking myself the following questions:
- What business process triggers the error?
The first question for me is always which business process triggers a certain error. Without an detailed understanding of which business process and its context causes an error identifying the root cause might be impossible.
- What does the error message tell me?
In the case of a dump this is pretty easy. The details of the dump clearly show what happened and where it happened. However, in the case of an error message the first step should always be to check if a long text with detailed explanations is available. Most error massages don’t have an detailed e
description available. But if a detailed description is available it is usually quite helpful.
Even the error messages without detailed descriptions can be very helpful. For example error message following the pattern “…<some key value> not available.” or “….<some key value> is not valid.” usually point to missing customizing. In contrast to that a message like “The standard address of a business partner can not be deleted” points to some problem in the process flow. Once one gets used to reading the error messages according to those kind of patterns they are quite useful to narrowing down the root cause of a error.
- Which system causes the error?
Even if it seams to be trivial question it is in my opinion a quite important on. Basically all software systems in use today are connected to other software systems. So in order to identify the root cause of an error it is important to understand which system (or which process in which system) is responsible for triggering the error. While this might be easy to answer in most cases there are a lot of some where answering this question is far from trivial. For example consider SAP Fiori application that is build using oData service from different back end systems.
- In which layer does the error occur?
Once the system causing an error is identified, it is important to understand in which layer of the software the error occurs. Usually each layer has different responsibilities (e.g. provide the UI, perform validation checks or access the database) For example, in a SAP CRM application the error could occur in the BSP component building the UI, the BOL layer, the GenIL layer or the underlying APIs. Understanding on which layer an error occurs helps to take short cuts while debugging. If the error occurs in the database access layer it’s probably a good idea to not perform detailed debugging on the UI layer.
Usually I try to get a good initial answer to this questions. In my opinion it is important to come up with a sensible assumptions for answers to these questions. If the first answers obtained by reasoning about the error are not correct the iterative process described below will help to identify and correct these.
…and the code
The next step I take is looking at the code without using the debugger. After answering the question mentioned in the previous section I usually have a first idea in which part of the software the error occurs. By navigating through the source code I try to come up with a first assumption what the program code is supposed to do and which execution path leads to the error. This way I get a first assumption what I would expect to see in the debugger and also test my assumptions if come up with so far.
Note that trying to understand the code might not be sensible approach in all cases. Especially when dealing with very generic code it is usually far easier to understand what happens using the debugger. Nevertheless, I’ve had the experience that first trying to understand the code without the debugger allows me to debug much more efficient afterwards.
Debugging as an experiment
After all the thinking it is time to get to work and start up the debugger. I try to thinks about debugging as performing an experiment. After understanding what the scenario and context are in which the error occurs (by thinking about the problem) and getting a first assumption what the cause of the error might be (by thinking about the code) I use the debugger to test my assumptions. So basically I use the cycle depicted below to structure my debugging sessions.
First I try to think of an “experiment” to test my assumptions about the problem. Usually this is simply performing the business process that causes the error. Especially if an error occurs in a complex business process it might be better to find a way to test the assumptions without performing the whole complex process. The next step is to execute the “experiment” in order to test the assumptions. This basically is the normal debugging everyone is used to. If the root cause of the problem is identified during debugging the cycle ends here. If not, the final step of the cycle is to refine the assumptions based on the insights gained during the debugging. On the basis of the new assumptions we can redesign the experiment and start the cycle over again. In this step it is important to move forward in small increments. If you change to many parameters between to debugging sessions it might be very difficult to identify the cause of a different system behaviour. For example consider a situation where an error occurs during the address formatting for a business partner. If order to identify the root cause of the problem it might be sensible to first test the code for the address formatting with a BP of type person and after that with a BP of type organization with the same address. This will enable to check if the BP type is part of the formatting problem or not.
<F5> vs. <F6> vs. <F7>
During the debug step of the cycle presented above the important question in each debugging step is if to hit <F5>, <F6> or <F7> (step in, step over or step out respectively). Using <F5> it is easy to end up deep down in some code totally unrelated to the problem at hand. On the other side using <F6> at the wrong position might result in not seeing the part of the source code causing the problem.
In order to decide if to step into a particular function or method or to step over it I use a simple heuristic that has proven very useful for me:
- The more individual a function or method is the more likely is it to use <F5>
- The more widely used a function or method is the more likely is it to use <F6>.
Using this heuristic basically leads to the following results:
- I will almost always inspect custom code using <F5>. the only exception is that I’m sure the function or method is not the cause of the problem.
- I will only debug SAP standard code if I wasn’t able to identify the root cause of a problem in the custom code.
- I will basically never debug widely used standard function modules an methods and instead focus on new ones (e.g. those delivered recently with a new EhP).
As an example consider an error in some SEPA (https://en.wikipedia.org/wiki/Single_Euro_Payments_Area) related functionality. When debugging this error I would first focus on the custom code around SEPA. If this doesn’t lead to the root cause of the error I would start also debugging SEPA related standard functions and methods. The reason is that this code has only been recently developed (compared to the general BP function modules). If I would encounter function modules like BAPI_BUPA_ADDRESS_GETDETAIL or GUID_CREATE in the process I would allways step over them using <F6>. These function modules are so common that it is highly unlikely they are the root cause of the problem.
Nevertheless it might turn out that in rare cases everything points to a function module or method like e.g. BAPI_BUPA_ADDRESS_GETDETAIL as the root cause of an error. In this case I would always check the SAP support portal first before debugging these function modules or methods. As these are widely used for quite some time it is highly unlikely I’m the first one encountering the given problem. Only if everything else fails I would start debugging those function modules or methods as a last resort.
The right mind set
For all the techniques described before it is important to be in the right mind set. I don’t know how often I heard sentenced like “How stupid are these guys at SAP?” or “Have you seen this crappy piece of code in XYZ”. I must admit I might have used sentences like these one or two times myself. However, I think this is the wrong mind set. The developers at SAP are neither stupid nor mean. Therefore, whenever I see something strange I try to think what might have been the reason to build a particular piece a code a certain way. What was the business requirement they tried to solve by the code. This usually has the nice effect that with each debugging session I learn something new about some particular area of the system. This will in the future help me to identify the root cause of new issues more quickly.
And probably the most important technique of all is the ability to take a step back. It happened to me numerous times already that I was working on a problem (be it a bug or trying to implement a new feature) for a while without any progress. For whatever reason I had to stop what I was doing (e.g. because the night guard walked in and ask me to finally leave the building). After coming back to the problem the next day i quickly found the solution. It then always seemed like I had been blind for the solution the day before. So whenever I get stuck working on a problem I started to force myself to step back, doe something else, and revisit the problem afresh a few hours later.
What do you think?
Finally I’d like to here from you what your approaches to debugging are. Do you use similar practices? What are the ones you find useful in identifying the root cause of complex errors?