The art of troubleshooting
The art of troubleshooting
After spending a week in Bruges (Belgium) to celebrate 2010 I’m back with an on demand blog.
<yes, this is an older blog that was in my personal space, I have moved it into SAP Netweaver Administrator space>
Troubleshooting, a topic which is often brought up and on which many questions exist. I love troubleshooting, yes it can bring a special atmosphere of nervous and stressed people but it’s a real kick to get the solution in place.
I will use an example of a problem that occurred which was solved so you have an idea of my view on troubleshooting in general.
Issues or problems?
Troubleshooting becomes necessary when issues or problems occur. An issue is considered to be a one-time event while a problem is considered to be an issue which reoccurs.
Throughout this blog I will use an example of a problem which I recently encountered at one of our customers.
An end-user got a pop-up on his screen to log in again in a SAP Portal environment when he wanted to mark vacation days (scenario employee self service). Since the SAP Portal uses authentication through single sign on (sso) this should not be the case. Since the end-user felt the SAP system was not working properly he logged a ticket at the helpdesk.
By the time the above issue reached me and involvement from my part was requested, the issue had become a problem. Multiple end-users where experiencing the same event as described above and starting to create helpdesk tickets to complain about the malfunctioning. By then it was ongoing for days so it was really a problem.
One of the most important pointers to solve an issue or a problem is the information that is provided, the information you can gather yourself and the information you can gather from other persons.
The better the issue or problem is described, the better the quality of the root cause analysis will be and the faster it can take place.
Issue or problem description: I get a pop-up while I’m in SAP and normally I don’t get it. Can you please check what the cause of this issue could be?
Example two:With Example one I can’t do much if you have hundreds of SAP systems, you don’t even know which one has the issue.
With Example two I can derive from the description and screenshot which SAP Portal(s) are involved and I have an idea which user-id to include in my search and which date/time the issue occurred so I can early narrow down log information to the specific given data.
The importance of information is one of the important key factors to finding the root cause for a specific issue or problem.
What do you do when you have too little information to go on when you receive an incident or problem? Start asking questions yourself.
Contact the end-user and ask the questions which you want to know, he might not have all the answers you were hoping for but at least you can get more information. Will the end-user be annoyed by the fact you call him? Not in my opinion, it shows you received the issue or problem; you care about it because you make the effort to contact him and it shows that you are doing something with it so it’s very unlikely that the end-user will be offended.
The same is valid for other kind of problems, a performance problem for example, contacting the user and checking where he feels the performance loss is noticed can only help you out when performing the actual root cause analysis.
You can also check online, excellent resources such as help.sap.com , SAP Service Marketplace , SAP Community Network and Google (is your friend) to check if you can find someone else who encountered the same or a similar issue or problem and find out how they solved it.
Having problems finding valuable information on the internet, you can read one of my previous blogs “[How to find valuable information on the internet | How to find valuable information on the internet]”.
Asking a question is important to get the right information and provide proper answers but it’s not only about asking questions to the end-user who experienced an issue or a problem. You can also ask questions to other persons, SAP support, a project responsible (where there any recent changes) or even yourself.
An example of a question for myself as an Administrator:
Can it be normal that a pop-up appears while single sign on is configured? Yes it can be normal behavior.
Another example of a question for myself as an Administrator:
Will I be able to find logging of a single sign on issue by default? Not necessarily because single sign on is a feature. You can enable it and therefore it’s not an error when it is not configured so by default it’s possible you don’t see much information at all.
If you want more information on a topic, find out where the logging is, how you can set settings to see more (if not sufficient) and gather knowledge on the topic by finding information. If you don’t know, you can find it on the internet, for example
https://service.sap.com/sap/support/notes/1257108 – SAP Note 1257108 Collective Note: Analyzing issues with Single Sign On (SSO) is a good start point.
Find out which parameters you have to set or increase to see more information and try them out.
The answer to the previous example question takes me to the perspective, from an administrator point of view, it can be normal behavior that a pop-up is shown and therefore from a pure technical perspective it’s not an issue or a problem (the SAP system is functioning as designed, normal and stable). Off course without investigating you cannot yet rule out a technical malfunction but we come to that later on.
From the business perspective, the SAP system is malfunctioning, the pop-up is annoying and people get worried the SAP system itself it not configured properly and they get anxious because they still have lots of work they need to perform on it.
This can give situations where the support team throws back an issue with an answer which means the following: “There is no issue, its normal behavior”.
The problem here is the feeling the end-user has when the pop-up appears, from the end-user perspective there is a problem. As an administrator the task then becomes improving the end-user experience.
Regardless of whether the issue or problem is a feeling or a technical problem it’s always a good idea to perform a series of checks. Is the configuration of the SAP system still according to SAP recommendations? If no, what is the reason for the configuration mismatch when comparing to the recommendations? Are the latest patches installed? If no, can the issue or problem be a known bug which is included in the patches that are available?
Checking all those questions gives you a bunch of information and answers to narrow down the result of your troubleshooting.
By checking the relevant SAP notes on Single Sign On (sso) I found some settings which were incorrect.
https://service.sap.com/sap/support/notes/1166904 Assertion Ticket SSO for Web Dynpro Java JCO destinations describes how the option “SSO ticket” in Web Dynpro Java JCO destinations can lead to problems and the recommendation to set them to “Assertion ticket”. This is only valid for certain releases which are mentioned in the note but in the problem example used in this blog it was the case.
One of my colleagues who was also looking at the problem (sometimes a good idea to get multiple views on a problem as every individual is unique and can have other ideas or opinions) and was able to log the occurrence of one of the problem situations. By checking that log I found out the incident occurred on a specific Java server node. When checking the logs of the system I found all similar incidents (same logging) where on the same Java server node.
That gave me the impression the Java server node was incorrectly configured and when comparing the configuration against the other Java server nodes it became clear, a parameter was missing which caused the issue captured in the logging.
What we have here now is a multi-dimensional problem, it started as a single problem but apparently it’s not one problem, it’s a combination of multiple problems.
After those settings were adjusted, things calmed down for a few days and then again incidents where raised that the event was reoccurring (again pop-ups on end-users screen was noticed).
Gathering the troops
What happens with recommendations to solve a problem is that each recommendation is performed separately (to know which one solved the problem or causes a new one). When you would change every single setting at once, you have the risk that you don’t know which one solved the problem and if a new problem arises due to the adjustment, you don’t know which one caused the new problem.
If the problem needs to be fixed ASAP and you have multiple persons involved it’s a good idea to gather the troops. Start a problem team and sit down with a number of persons (functional, technical, project lead and so on). That way everyone can ask questions, you can define action points and get the process in a forward spiral towards the solution. Be careful not to spend too much time on sitting together because you also need to spent time on solving the actual problem.
Depending on urgency, complexity and feasibility you can also get support from others, for example a customer message with SAP support. When to call in help is a decision you have to make yourself, there is nothing wrong with doing so but don’t make the decision with the lizard brain (fear of not finding the solution, *** covering, and fear of the unknown). At minimum you can do a serious effort to gather as much information as possible yourself.
It’s very unlikely you are an expert in each area or SAP product and I don’t think it is expected you are. You can find a lot of information on the internet which can be used to work towards gathering information and finding a solution. If you feel you are wasting time and not getting anywhere at all and the problem is very urgent, call in help but do it for the right reasons.
A strong plus point is the ability to think logically. By asking questions and gathering information you can form an image of what should be in place, what should be working, what not and so on.
To be able to investigate this problem further (which was actually a third problem, originally it was assume one problem existed) I requested information again. Which user-id experienced this issue, when did it occur and so on. On Monday I got contact information of an end-user who had experience the problem on Friday. I contacted the end-user and explained in simple words what I thought was causing the problem.
There are multiple timeouts when you take a look at a SAP Portal environment (in this case the SAP Portal was connected to other SAP Portals, to back-ends and all have single sign on configured). You have a session which can expire (session timeout), you have an application which can expire (application timeout) and you have a cookie which can expire (sso timeout). What happens if your session timeout is larger than the sso timeout, you get a pop-up to logon again. If you set the session timeout equally large as the sso timeout, you won’t have the pop-up because your session is gone.
You can find information on this in the following SAP note:
https://service.sap.com/sap/support/notes/842635 SAP Note 842635 – Session Management for Web Dynpro Applications
When checking those specific settings I found the session timeout was much larger than the sso timeout which could explain the reason the problem occurs. When talking to the involved persons (project leads, end-users and so on) I found out there had been complaints on the session timeout being insufficient in the past and that the session timeout was increased which caused the pop-ups to start to occur.
Using the appropriate tools can help you gather and interpret the information you need to perform troubleshooting. A good start point for Java based SAP systems is setting up Wily Introscope and SAP Solution Manager Diagnostics . The diagnostics scenario has multiple analysis tools available.
For example change analysis provides you with an overview of changes that were performed on your SAP system. Once enabled and properly working you can for example see that the session timeout was increased from value X to value Y on a certain date/time. This can seriously speed up your root cause analysis.
The downside is that it’s a serious effort to setup the scenario (number of agents and configuration steps) and you have to make sure it stays running properly because else the information becomes useless.
For specific components or issues, specific tools exist, search for information to know which tools to use when you need to troubleshoot. Search valuable resources to find information.
Why is the title of this blog, the art of troubleshooting? Because there is no proper flowchart which will enable you to solve any issue or problem you encounter. Information is knowledge, knowledge can give energy to move forward but knowledge doesn’t necessary mean success.
In the end the problem was solved and the end-users were satisfied. Troubleshooting is a form of art for me, you can find a lot of pieces of the puzzle but you have to put the puzzle together yourself, a big part of doing that is based on logical thinking (which requires insight on the components where the issue occurs). Sometimes the puzzle only contain six pieces and it’s clear and easy, on some occasions it’s a 3D puzzle with a thousand pieces.
You can practice on becoming better to perform troubleshooting in problem situations. Pick up every problem and try to find a solution for it. Or try to find a solution in parallel if one of your colleagues is solving a problem.
A great way to become better is being active on the SCN forums. Read problems other users have and try to find the solution. It has several advantages: you increase your troubleshooting skills, you learn new things (don’t be afraid to search for a solution of a topic you don’t have knowledge on) and you get rewarded (people will notice your activity) and you will receive reward points.
Since information is so important it’s also important to share with others. That’s why I’m using a real example in this blog; it can provide useful information to others. If everyone would post solutions to their SAP problems on SCN, wouldn’t it be easier to find a solution for a problem? Yes it would.
A community can be very powerful. Want to reduce the total cost of ownership and get problems resolved faster? Start sharing.
Want to find out how I find SCN to be added value in my daily work environment, read one of my previous blogs “[SAP Community Network as added value in the daily work environment | SAP Community Network as added value in the daily work environment]”.