HTTP dirty talks #1

Former Member · ‎01-07-2008

Hi,

Sidebar:
Just not long ago I mentioned the importance of keeping your Get your free I(E)Tunes, well now you can see it pays off. MS published a new patch for IE6 which solves slow web browser performance when browsing pages with JavaScript code. Tests show varied improvement from 0% and up to 50% depending on the exact scenario. I recommend installing it... enjoy.

Now let's go back to why we are here in the first place - client side tracing. By now your IE should be Get your free I(E)Tunes, patched with the latest fix and you installed a Tools of the trade - improve portal performane using browser tracing tools. I will use HTTPWatch today, but any tool should show you similar results.

Why is it so important?
Sometimes during this simple process you discover many errors and problems that needs to be attended, you won't find them otherwise. This tracing is important if your users are within the LAN, if they are browsing via WAN - I think skipping it is not even an option.
The main two goals you want to achieve are:
* reducing the number of network roundtrips (utilize the browser cache)
* reducing network traffic (get the request/response as thin as possible)

This is your recipe for true happiness (oh, just after world peace and good health of course).

How do we start?

Choose a simple scenario:
Your scenario should preferably involve one click/navigation step, it's easier to isolate the problems this way.

Clear the cache, and warm up the browser:
You want to record the http traces when the browser cache is full, this is usually your common scenario. BUT, before you start measuring you better clear the cache once and perform your scenario to warm-up the browser and populate the cache, this will remove old resources you don't need any more.
*Tip - in IE6 If you browse to some page, clear the cache and at the end click the "OK" button the browser will immediately reload the page to cache. If you click "Cancel" it will not load it... just good to know.

Record your trace:
Go to the page you want to start from (for example the login page), click the record button in the tracing tool, perform your navigation in the browser, when the page is fully loaded stop recording. That's it! Save the trace so you can also reference it later and you can start your analysis.

Analyze:
I suggest doing it in a few steps and not everything at once. First clear out the obvious problems (usually errors) then get a better understanding of what is being called and repeat the tracing.

Choose your enemy - response codes
The first thing you look for is the response codes of the server responses, this will give you a quick view of your status. You can find it either in the "Result" column or in the first line of the server response headers.

The common values are:

Cached - this resource is cached on your browser, the request is not sent to the server and you will not see this request in the http logs. The browser fetches this resource locally from its internal cache.

200 (OK) - if your resource is not cached this is what you prefer to have. 200 means OK, the server behaves normally by sending back the resource you requested.

302 (Found) - this actually means a redirect. What happens is that you request some URL and the server send you back a new URL, now the browser sends a new request to the newly received URL. This is bad, very bad. You pay double the price here, Instead of one roundtrip you get 2. The common situations for this is during login (redirect to some authentication service), redirecting to external content such as dotNet or if you have a "nice looking" URL for your portal and it redirects to the actual "ugly" URL. If you can avoid this situation it is preferred... otherwise, everything has its price.

Below you can see an example for 302 response (highlighted with the mouse) and the corresponding "Location" header. Note that the response body is empty ("Content-Length: 0" header), no actual content is returned except for the URL in the header.

304 (Not Modified) - This is one vicious performance killer and worth a few more words, it is usually related to cache configuration problems. Your browser is smart, it prefers to use its cache over sending new requests to the server BUT what happens when it is not sure the cache is still valid? It sends a "conditional request", this is a small request for the resource with the last time it was changed, if the server decides your copy is still valid it returns an empty response with 304 code and the browser will fetch the resource from its cache. It also happens right after the cache lifetime period is over or when it is absent. We will get more into it next time when we discuss http headers.

401 (Unauthorized) - or in other words your authentication failed/missing. This is something you definitely don't want to see. You can now check the URL that returned it and check what is wrong here.

403 (Forbidden) - stop in the name of the law! surprise, surprise, you are trying to view content that you are not authorized to see. naughty you! Similar to the above 401 response (same-same but different), check the URL and either extend the permissions or remove this resource from the page.

404 (Not Found) - the resource is not located where you expect it to be and the server can't find it. Sometimes it is a missing image that wasn't deployed, sometimes it is a typo in the html - in any case this is a problem that is easy to fix, find the missing resource and put it in the right location.

500 (Internal Server Error) - You have a problem, or better said - your server has a problem. Something is very wrong with this request on the server side. Probably you can find some errors in the log files...

503 (Service Unavailable) - The service is not available. In many cases it means that the service you requested is still initializing and not fully started yet.

If you are curious about other response codes you can find the complete list of HTTP response codes in rfc 2616.

A simple example from the SDN site

by now you probably say "oh well, how bad can it be? Does it really worth the effort?", they say a picture worth a thousand words, so watch what I marked in red below:

What do you see?
This is a trace I took while browsing the SDN from my desktop in IL.
The first marked file from the top is of type image.gif, it is a simple gif file that the browser loads from cache (you can see the "(Cached)" value in the results column). Time estimation for retrieving it is 0.002 sec. sounds reasonable, isn't it?
Now watch the marked file below to see what happens when the cache is not sure if it has an updated version of a file - it sends an "if-modified-since" conditional request, the server receives the request and verifies the time stamp of the file and decides that the file did not change. The server now sends a 304 response code and the browser ends up retrieving the image.gif file from the cache. Total time for retrieving the file is 0.717 sec. almost a full second for a small gif file. And I want to stress this - both files were eventually loaded from the browser cache!
You wonder what is this important gif file? If you click on my name at the beginning of this post and get to my list of posts, see this small SAP icon at the right side of my name? BINGO! Almost 1 second of your life... you should give this icon more respect now.
While you are here, why not clicking on my previous blogs or leave me a nice message? Support my effort, you know.

What's next?
You have now about 2-3 weeks to get experienced with your favorite tracing tool and check all those missing files, bad permissions etc. I will use this time to write the next post, where I will show you how to understand the http headers and use the HTTP configuration options on the server to eliminate the 304 responses, along with other tuning tips.

Until next time,
Yoni