The idea for this blog started about a year ago at a customer site where we were having periodic slowdowns, the system was not configured in Solution Manager for performance data collection and we could only visualise 1hr of performance statistics via ST03N before we affected system performance ourselves – we had a performance problem and limited ways to access the data to troubleshoot it, this got me thinking about the most effective way to do initial performance troubleshooting.
This blog will detail a way to perform a solid initial analysis on an SAP system using simple tools and visualisations to find where your performance issue is located. As this is quick and dirty, I am going to make a couple of assumptions
1. There is no Solution Manager or equivalent receiving any performance data for the SAP instance having performance issues
2. The issues are on an ABAP stack, DB independent.
3. The collection, analysis and visualisation had to be completed within 24hrs, or it could not be considered quick.
One of the most powerful tools in a systems administrators hands on an ABAP stack is the transaction /SDF/MON
I won’t describe each of the selections in detail, in general the default selections are useful for an initial data set. The main gap here is not having any visibility on the database metrics, which can pose a blind spot for any analysis. The picture below shows the output I received from a test run of the selections above.
This will provide the raw data for our analysis – although I can use data from ST03N, one of the biggest problems with that data is that there are a number of different scales for the same data type. For example time data in ST03N can be in seconds for some measures and milliseconds for others making analysis difficult.
Once the data has been collected the next phase is to analyse the data, this is easily the most time consuming part of the process – my good friend Andrew Fox calls it the iceberg challenge as most of the detailed work is under the surface. Below you can see a table of the metrics output by SDF/Mon and the additional metrics to aid calculations, for example determining the number of free work processes and the amount of used memory.
|Date||Time||Server Name||Act. WPs|
|Dia.WPs||RFC WPs||CPU Usr||CPU Sys|
|CPU Idle||Paging in||Paging out||Free Mem.|
|EM allocated||EM attached||EM global||Heap Memory|
|Paging Mem||Roll Mem||Dia.||Upd.|
|Additional data||Number of WPs||Physical Mem||Free WPs|
or a more readable version
After a number of attempts to find a way to represent and visualize the data in a meaningful way which would make it easy to show any issues – I found that aggregating abstracted data would be best at helping me to see specific effects.
The grouping below, shows the metrics which affect the area of study most. For example, having no free work processes, directly affects User performance, but it does not affect the Server performance. Similarly Paging affects Server performance directly, which raises CPU utilisation and indirectly affects User and Application performance.
|Free WP||Used Memory||CPU Idle||Sessions|
|CPU User||CPU Sys||Paging||Used Memory|
|Free WP||Sessions||Act WP||Logins|
Before grouping the metrics I had to abstract the values to give a consistent scale from good to bad. To do this I found the difference between the minimum and maximum values, then divided this into quartiles (2-4, 4-6, 6-8, 8-10). As shown below, I now had ranges to which specific and consistent values are applied.
Using the ability of Excel to filter data, shown below, I was able to add a column beside each metric and quickly mass populate it with the correct quartile value – for example, if the Used Memory on the system was 40954821 – then the value assigned to that entry is 4 as it sits in the 2nd Quartile between 28272242 and 56544484.
The table below shows how the abstraction produces the values which are now on a scale from good (2) to bad (>8), this makes graphing and visualising much easier.
The data cleansing and abstraction was by far the longest part of this piece of work, mostly because I was exploring new areas of data analysis and also because I had to be very clear about what I wanted to visualise. The visualisations were done pretty quickly, as I had consulted with some friends on the best way to visualise the data.
My initial desire was to use SAP Lumira and have an animation to show the peaks and troughs of the data in a visually stunning way – but this was not to be as Lumira does not have the ability to have Time hierarchies as detailed as seconds and the desktop version has a limitation to 10,000 data points. So in order to get the data visualised quickly, I just used Excel line graphs (shown below) – but these are not dynamic and are quite noisy.
From the graph above, you can see very easily that Application performance is generally good, the Server performance is stable but the User performance experience is worst of the three. This would then direct me to look at the components which are aggregated to provide the User totals and determine where is issue is located.
This analysis, although it has a number of steps in it, has yielded specific data which is easily understood and in a relatively short space of time. It certainly has yielded more insight to me than trying to consolidate ST03N data into a meaningful set of data.