Hello SCN Community
The product root cause analysis or also referred to as diagnostics for SAP Solution Manager is often discussed or brought up in webinars, podcasts but it seems not that many community members use it or have it installed.
While there is room for improvement I very much like this product, since I’m an administrator it can help me out to perform troubleshooting in a faster way. To do something about the fact that many don’t know what you can do with it, I decided to create some blogs (starting with this one) on different aspects of root cause analysis and showcase some real data or real situations.
E2E Trace Analysis
In Solution Manager 7.0 EHP1 you have five big tools in root cause analysis which are workload analysis, change analysis, trace analysis, exception analysis and Wily Introscope. There are also a bunch of related tools that offer interesting functionality, for example Thread Dump Analysis, File system Browser, OS Command Console, Change Reporting , Expert Analysis and so on.
In this blog I will show some data of E2E Trace Analysis. E2E stands for End to End which means we will trace from the users front-end through multiple SAP systems all the way to the backend host and back. The scenario performs HTTP tracing so it is not a replacement for ABAP tools which trace an ABAP transaction which does not use HTTP.
I have followed SAP course E2E100 (Root Cause Analysis) which gives theory on the tools and also gives hands on exercises to use the tools. It doesn’t feature technical implementation. The course was interesting as I had a good teacher with in-depth knowledge on the subject so I could raise questions along the course. I’m also certified as Root Cause Analysis Expert – SAP E2E Solution Operations.
I have performed implementations for multiple customers and I also use the product for those customers. You can perform an analysis of a problem at the moment it occurs or afterwards but the product doesn’t offer alerting (at least not free of charge).
Next week I’m off again to perform another implementation of root cause analysis.
I heard someone ask a question on the need to stop/start managed SAP systems after performing the setup in the SolBros podcast (which I recommend you check out if you are into Solution Manager). The way I handle this is that I perform the setup on day one and then the next day I start before business hours and stop/start the SAP system myself, that way I can check the logs afterwards to see if everything is working properly. If I detect issues, I then still have the necessary time to make a change where needed, rerun the setup and stop/start the SAP system again. If SAP can change this so it can be done online it would mean the product would be much more attractive.
Technical configuration in short
Prerequisite is of course that root cause analysis is configured. The configuration which I might blog in more detail some time contains Wily Introscope, the root cause analysis (or diagnostics) scenario which you can setup through transaction SOLMAN_SETUP (wizard based action list), an Introscope Agent (ISAGENT Java Component on your Solution Manager), a host agent (running SAPOsCol etc) on each physical server that will be connected and a diagnostics agent (also called SMD agent) for each managed SAP system. Once you have those in place you also have to run a managed system setup and Wily agent setup for the SAP systems that need to be integrated in root cause analysis and finally you have to stop/start the managed SAP system to enable the data capturing.
If you have enough patience I might build some blogs to set you on your way for the technical configuration.
Accessing root cause analysis
You can access root cause analysis through the easy access menu given you have the authorization or through the Solution Manager workcenter (transaction SOLMAN_WORKCENTER).
Accessing End-To-End Trace Analysis
There you can either choose End-To-End Trace Analysis in the menu on the left (not visible in screenshot) or you click on End-To-End Analysis.
This will change the right pane and it will offer tabs in which you can see a list of managed SAP systems.It is also possible to build your own tabs which are available for your own user-id by the Define New Query option on the right hand side. This way you can create tabs for certain solutions or for SAP system that belong together if you have a large amount of SAP systems. The screenshot I have taken is from a test environment. I have other sources where there are much more (referring to the numbers next to ABAP Systems and Java Systems in picture 1.2).
Under the tabs shown in picture 1.2 you find the main tools that can be accessed which I referred to in the beginning of the blog.
Yes I used Photoshop for picture 1.4 before someone asks. On picture 1.4 you can see SAP Enterprise Portals, one version 6.40 and one version 7.0. Imagine both portals are connected together and content that is displayed on a web page is coming from both of them, certain parts from AA1, and other parts from BB1.
Tracing multiple SAP systems
The question now is: how do you trace multiple SAP systems? This is how: Select the first SAP system, AA1 in the example by clicking in the column which is in front of System ID (I placed red cursor on it to make it visible in picture 1.5). Once the line AA1 is selected, hold the Ctrl key on your keyboard and click on the field below the field you just selected to also select BB1. The result should be that both SAP system lines are highlighted as shown in picture 1.5.
Next hit the Trace Analysis button which you can see on the top of picture 1.5.
The Trace Analysis application will now be started (see picture 1.6).
Before you can actually start tracing you have to Trace Enable the SAP systems which are going to be involved. Click on the Trace Enabling tab to proceed (see picture 1.6).
Here you will see the SAP systems you have selected before (AA1 and BB1 in the example) and you can enable one by one by selecting the line and clicking enable at the bottom or all at once by clicking Enable All on the top.
Once you have chosen to enable, the application will trigger the necessary changes in the satellite system so it will trace content (to be able to trace you have to maintain trace parameters so this application does that for you which is useful and necessary because else it would mean a lot of manual actions would be required).
Once the changes are done, you will see the state changes into a green square, stating the tracing can take place.
In the Options tab you have some more options which you can select to influence the tracing. The EEM options on the bottom are related to the End-User Experience monitoring (EEM). End-User Experience monitoring uses diagnostics agent to monitor the performance of SAP HTTP based applications. You should look it up as it’s definitely interesting content. It’s most useful for system administrators to identify which step of an HTTP based application has bad performance.
How do you start the trace?
Ok so now we have the settings in place (SAP systems chosen, enabled for tracing and options left on default) but how do you initiate the trace then? The trace is initiated by the SAP Client Plug-in which can be downloaded from SAP Note 1435190.
You then install the SAP Client Plug-in on your laptop or device which you are using and start the plug-in starter GUI to get started. You can see the tool started up in picture 2.1. I also opened the dropdown box to show the options. I used Internet Explorer (No support for Mozilla Firefox). You can check the SAP note mentioned previously for versions which are supported and the OS that you should have for them.
Once you have chosen, click the Launch button. It is recommended to close all other Internet Explorer windows to prevent any capturing of data outside of the SAP which you want to trace (see picture 2.2).Your trace shouldn’t be influenced if you don’t have any dynamic auto refreshing pages open and basically no activity of other web pages.
A new Internet Explorer window will open and you will see the following screen in the SAP client plug-in (see picture 2.3). Fill in a description so you can recognize your trace afterwards in the Business Transaction Name.
On the low right corner you can maintain the SMD Host (Solution Manager) and the SMD HTTP port (standard J2EE port of your Solution Manager). You don’t have to maintain the Term ID because it will get filled in automatically. If you fill in the SMD Host and Port, the trace will be sent to the Solution Manager automatically. If you don’t fill in the details, the application will save an XML file (BusinessTransaction.xml) to your local disk and you can upload that XML file in Trace Analysis in your Solution Manager.
You also have the choice to cut the tracing in pieces using steps. If you have a short trace, don’t bother using it. If you trace for a long amount of time or you want to investigate very specific steps, you can use the steps. You can change the step name so you can identify which step contains which operations. To start the actual trace, hit the Start Transaction button (see picture 2.3).
Once the trace is running perform the actions in the web application (for example clicking a button and loading an iView in portal). You will then see the Sent Byes, Received Bytes, Requests and Reponses number go up (see picture 2.4).
The end of the tracing is triggered by pushing the Stop Transaction button (see picture 2.5).
Then either the data gets uploaded to Solution Manager automatically because you maintained the SMD Host and SMD HTTP Port in the SAP Client Plug-in or an XML file (BusinessTransaction.xml) is saved on your local disk which you can upload in the Solution Manager (as mentioned in the previous section).
Checking the Trace results
To check the trace results go have to go back to the Trace Analysis application. Follow the same path as described earlier if you don’t have the application open anymore. There you will then either find a new entry in the Business Transactions table after you hit the refresh button or you first have to upload your file using the manual upload option which you see at the bottom of picture 2.7. Once you have uploaded the XML, a new line should appear in the table.
In picture 2.7 you can see that the middle trace (I Photoshopped out the description) was fully processed, hence the green light under the state column.
What do I mean with fully processed? End-to-End tracing does not only trace the HTTP activity on the SAP systems you selected, it can also collect and analyze log files that were written at the date/time you executed the trace.
Once you select the line in the Business Transactions table (see picture 2.7) the logs and traces of the SAP systems previously selected will be collected (see picture 2.9) and perhaps even slightly more (checkbox Select systems for trace collection dynamically in picture 2.8) if the trace collection finds other involved SAP systems and has access to them.
After the logs and traces are collected, you will find a new table called Transaction Steps underneath the Business Transaction table. This table offers the step name, the timing, the trace level, the number of messages and the duration of the step in milliseconds. The trace step took 74057 ms which is 74 seconds.
To view the results of the trace step, select the line(s) and click on the Display button below the table.
You will then see the result of the trace. The data shown above is the HTTP Analysis data. There is also a section for Server Analysis data. The page is much larger but I will discuss the lower parts further in this blog. A button which catches the eye on picture 3.1 is the Auto-Analysis button.
When you hit this button you will taken to the HTTP Messages Table tab and the HTTP messages will get auto analyzed and sorted. The cursor will be placed on the heaviest HTTP message. Before checking the HTTP messages table, it’s a good idea to take a look at the component pie.
The component pie
I’m not really a pie eater, I do like chocolate pie but it’s one of the few types of pies I like. Besides chocolate pie I also like the component pie (see picture 3.3). It basically shows the main components that were captured in the trace. The idea behind it is to be able to determine in which component a problem resides so you can perform a more detailed investigation on that component.
If you look closely at the pie you can see a big portion of it is Network time. This is an actual pie from a real trace that I did (I didn’t Photoshop this).
The pie is also accompanied by calculations so you can also see in digits what time was spent on a certain component (see picture 3.4). You can see there was a whopping 43% time spent on network time. There was indeed a bottleneck in the network in this case. It’s often up to the administrator to prove that the bottleneck is not the SAP application, this kind of information can easily point out the component where the bottleneck lies. Of course it is not always this visible, I was lucky to bump into this great example.
The Server time distribution among hosts pie
The server time distribution among hosts pie offers the hosts which are involved in the trace and the time and % spent on those hosts (see picture 3.5 and picture 3.6).
The HTTP response pie
There is also a HTTP response pie (see picture 3.7) which shows how many HTTP return codes were successful and how many threw errors. The return codes 404 are erroneous return codes. You can check which HTTP messages returned error code 404 in the HTTP Messages Table tab (shown in next section).
HTTP messages table
Another tab in the HTTP Analysis section is the HTTP Messages Table (see picture 3.8). I’m not going to discuss the HTTP Messages Graphics tab to try and reduce the blog size as it offers the same content but displayed differently (table vs. graphical overview).
Normally you will see the URL in the URL column but I erased it because I cannot show the real URL. You can exclude certain URL’s (by default css, js, gif, jpg, png is not taken into account). The column on the right side are status, client preprocessing time, HTTP response time (client),Execution time on server side, client rendering time, bytes sent from client to server and bytes sent from server to client.
You can see the sRT is displayed in red automatically which shows a bad rating for this HTTP message. The buttons on the left top corner (see picture 3.9) are used to view details, show or hide static content and so on. You can also see the client message headers and server message headers which is useful to troubleshoot SSO issues or to check what is done with the content. For example you could find Content-Encoding type gzip in the server response header. If the HTTP message is time consuming you should take a look at the message and check whether zipping the content makes sense or not.
There is also a section on server analysis at the bottom of the page (no matter which tab you are in). It offers a Summary tab and a Requests tree tab. The summary tab shows a system overview of involved systems and how much time was spent on server side.
The Requests tree tab shows data specific for time spent on server side. You can see the Net time spent, the CPU time spent, the DB time spent. When you see an icon in the most right column you can also view the SQL statements that were executed on server side along with their timing.
I hope I got some more information out in the open and hopefully some of the blog readers now have a better sight on what trace analysis offers. There are a lot of possibilities using the trace analysis tool. I touched some points here but there is more of course. If you have root cause analysis in place, you should give this a try.