Many customers frequently use End-User Experience Monitoring. Nevertheless, it is difficult to find an easy reproducible case to present. I would like to show a case which one of our customers recently faced and explain how we analyzed it with EEM.
The customers have a central CRM system used as a central customer contact system from several international locations.
Call center agents – especially from one location – complained about insufficient performance and a frozen browser during typing. This frozen-screen issue sounded a little weird like the behavior of old computer machines where every key stroke was send to the server before showing up on the screen.
Some details: There is a comment field in the customer contact page which is used to collect information from a calling customer. When the call center agent typed into the comment field the letters did not show up immediately. The letters appeared a few seconds after typing.
Interestingly, the issue only occurred from one of the remote locations even if both are supposed to have a very similar setup regarding network connection and number of call center agents.
At first glance it looked like an executed round trip to the server. This wouldn’t make sense for a simple edit field. There should not be any round trip for such an activity. Our analysis also confirmed that there is no round trip during typing.
We started the root cause analysis (RCA) by recording the business process steps in question (You can use the E2E Trace recorder as well as the EEM-Recorder). The result of the recording in the EEM-Editor looks like this (partially screenshots):
The recording above shows no activity related to the typing but some other frequent backend calls (every second) to
These calls started immediately after launching the CIC client.
The obvious questions out of these results are:
- Why is typing delayed without any backend activities?
- Why are there <notify> requests sent without user activity?
The notify call is used to update the client UI with server notifications as the server cannot send information to the clients from his side. The clients have to ask for notifications. The standard polling interval is 1 second. Therefore every client window sends such a request to the server every second. Agents often work with several open windows so a lot of network traffic and server activity is created.
Such a request is very quick if the network is good. In the table above the requests took between 15 and 32ms at the main location which is close to the data center.
The average response from a remote location was already between 150 and 200ms but due to some network drops it took up to 4,4 seconds.
The graph above shows you the response times from 3 different locations. The script shows constant response times from 2 locations – the two remote locations are obviously slower (because of the network) but one of them also shows a high deviation in the response times. We saw this already in the table above.
The real issue is the network drops from the complaining network location; it slows down the response time of the client application. But still the question is why do we a have a typing freeze if there is no round trip to the server during typing?
To understand this you have to keep in mind that browsers have a limited amount of connections they are opening to proxies/servers in order to save network resources. Originally this limit was 2 connections and older versions of browsers are following this (see RFC 2616).
Newer browsers do not follow this limit anymore. They are opening up to 8 parallel connections.
If we assume a maximum of two connections, a polling interval of 1s (remember the <notify> call) and response times for the polling up to 4 seconds it is obvious that the browser can get frozen for some seconds. The assumption is a blocked browser due to long running <notify> calls.
There are two apparent solutions to reduce the frozen times.
- Solve the network issues
- Increase the browser connections
- Reduce the amount of notify calls / increase the time between two notify calls
Activities & Result
Solving a network issue can be very time-consuming. As this is the undelaying issue it should be a top-priority issue. Despite a top-priority status it will take some time until the issue is solved.
Increasing the browser connection might help a little bit. Thus it is recommended to use a newer browser version as they allow a higher amount of connections. For that reason more server resources are requested.
The quick solution is reducing the polling interval as described in SAP Note 1574747. The challenge is to decide if you need polling at all or which is the best interval. Reducing the polling interval to 30 seconds for one of my clients prevented every freezing issue. Moreover, a nice side effect was that the servers’ CPU usage was decreased from 70% to 50%.
Other EEM related Blogs:
End-User Experience Monitoring (EEM) – What it is and why you should consider it
End-User Experience Monitoring (EEM) – Activating Custom attributes