Performance measurement (How fast does it go?) is the inverse of the question of scaling (How large the box to support N users per time unit?). Which highlights the central theme: measuring the performance of an application.
However, BSP pages are not applications, but merely the enabling technology to support the development of web applications. As such, each BSP application must be measured individually assess its performance characteristics. The first step is to test the overhead of the BSP runtime per se. Any applications performance will always include this basic overhead. This Weblog will concentrate on measuring the roundtrip latency for the BSP runtime. The roundtrip latency of an application can be measured similarly.
For measuring the BSP overhead, a small Hello World BSP page is created, and then fetched from the browser. This simple test measures the latency from browser to WebAS; authentication time; time to create the session; overhead to instantiate all the BSP runtime classes; and the call to the specific BSP page. This is the minimum overhead for each HTTP request onto a BSP page.
The tests are done for BSP pages of 1KB, 2KB and 4KB in size (up to 64KB is usual). These tests verify that there are no memory bandwidth problems within the system (memory copies, et. al.). It is expected that larger pages will take slightly longer, mostly due to larger transmission time on the physical network.
For a second series of tests, GIF images are fetched from the server. Access to the MIME repository is very expensive. However, the images are cached in the ICM cache, and the access time of subsequent does not relate to the first load time. Therefore, we always look only at the performance of the subsequent image loads.
For the tests, a BSP application IT03 was used, which consists of a simple Hello World page, called text0kb.htm. This is the primary page for measuring the BSP runtime overhead. In addition, a number of textNkb.htm pages are available for testing. The IT03 application also includes a collection of different imageNkb.gif images.
Factor to Consider: URL Mangling
When doing performance measurements, the goal is to run the same test a number of times. However, most test programs are structured so that each test is run individually, the results recorded, and then the test is started anew. This dictates that specific once-off actions experienced only once by a user are measured continuously by the test.
URL mangling is one such an example. In BSP applications a redirect is done to embed specific information (language, client, etc.) on the first hit for the application. For example, when the user requests the URL /sap/bc/bsp/it03/page.htm from the server, a redirect is first done to the new URL /sap(bD1ZS0PRF1Rw==)/bc/bsp/it03/page.htm. For performance measurements, only use the mangled URLs, to exclude the once-off mangling from the results.
Factor to Consider: Authentication
A similar problem is that of authentication. When each test run is started, a GET request for an URL is done. However, the WebAS requires authentication from the user, and the HTTP request is returned (reason: authentication required). The browser then pops up a dialog requesting a username and password from the user. This data is encoded into the HTTP header (field Authorization), and the URL is requested again. On all subsequent requests to the server, the browser will automatically send the Authorization header with the request.
In this case, two roundtrips for each test are often measured. After the first request is rejected, the test programs will use the stored username and password to test again, and the double roundtrip time is measured for each test run. It is important to ensure that this authentication data is already part of the initial request, to measure the actual request time, and not the additional authentication roundtrip.
Factor to Consider: Program Load Times
At a technical level, a BSP page is a generated ABAP class, for which a load (compiled program) is stored in the database. When an URL is requested the first time, the required program is loaded from the database, and cached in the program buffer. On subsequent requests, the program can be used directly from the buffer.
This first time database load is very expensive, and influences averages times, especially on short runs. Therefore, in all tests, it is highly recommended to add a warm-up phase before the tests. This also ensures that all images required are placed in the ICM cache (which will be the expected case for normal usage).
Factor to Consider: Network Latency
In typical use of intranets, the latency over the network is so small (milliseconds), that it is not really noticed. However, when performance measurements are done in the same range, the network latency plays a large role.
In in-house tests, where multiple hops must be traversed from browser to server, this has added a noticeable additional latency of a few milliseconds. It is recommended that for tests where the absolute numbers are important, both client and server be placed on the same sub-network.
In addition, the packet size of the network plays a role. Ethernet packets can be a maximum size of 1500 bytes. Large data packets are split into smaller IP packets, then sent individually and reassembled on the receiving side by the TCP/IP stack so although the data is sent all at once, the data in not received in one packet. This has prompted some tools to give both the time to first byte (TTFB) and time to last byte (TTLB) measurement times. Thedifference between the two values reflects more on network latency than on server processing time.
When testing in WAN environment or via the Internet, the properties of TCP become noticeable. Especially for larger data volumes, the TCP window-size plays a large role. An initial window of data is transmitted, and then transmission is stopped until an acknowledgement is received. This causes the TTFB and TTLB to be disparate.
Factor to Consider: Median Time versus Average Times
Quite often over the duration of a test, a few measurement points are seen which are clearly too large and do not reflect the expected behavior. For example, assume a run of a 1,000 tests, where 999 of the tests completed in exactly 10ms. However, one URL hit took 3 seconds (3,000 ms). Then the average of the 1,000 runs will be 12.99ms, and not the expected 10ms. We see that a few out-risers are capable of affecting the average. For test tests, median times are used. For the above example, the median would have been the expected 10ms.
Tool: Microsoft Web Application Stress Tool
There are many programs available for stress testing web applications. In principle, any program can be used. For this Weblog, we will use the Web Application Stress Tool from Microsoft. It is quick to install, and contains all the necessary features required for simple stress testing.
To download the program, go to Microsofts website, and search for Web Application Stress Tool. The interesting links currently are: Download the Web Application Stress Tool and Web Application Stress Tutorial.
Running a Test
With the stress tool, it is relatively easy to record a session. For the tests, the BSP test program IT03 is used. Pages and images of 1KB, 2KB, and 4KB are tested. As a first step, either record or manually create the script.
One of the first post-processing steps required is updating the server value. This field is not correctly set during script creation. Also all URLs that we are not interested in are deleted. Test URLs are kept in their mangled form.
Two additional post-processing steps are required. Double click on the first URL, check that the port is correct, and add the Authorization header. Apply this change to all URLs. Unfortunately, if the port is not correct, this must be manually updated for all URLs.
Finally, we configure a few set-up parameters for the test. In all cases, keep the stress level at 1 so that the true end-to-end latency is apparent without having test threads blocking one another. Configure a short warm-up time and the run time. As we have already configured the user information via the Authentication header, we now need to store any additional information.
Looking at the Data
Allow me a short story. Once, a long time ago, someone decided to compare their server platform against the BSP runtime. At the end of the tests, they wrote a small hero email: They could handle thousands of hits per second! The numbers were much better than those for BSP. What the colleagues forgot to check was whether they were also getting back 200 OK answers for each hit. What was actually measured was thousands of 401 Authentication hits! For a server, rejecting a request is much faster than completely processing it. (No, they never did email a retraction.)
Thus, the first check is to ensure that the test did actually run successfully. Look at the summary page, and ensure only OK return codes. The distribution of hits per page should be nearly the same.
Script Settings Server: us4049.wdf.sap.corp Number of threads: 1 Test length: 00:03:00 Warmup: 00:00:30 Result Codes Code Description Count 200 OK 381 Page Summary Page Hits GET /sap(bD1lbiZjPTAwMA==)/bc/bsp/sap/text1kb.htm 64 GET /sap(bD1lbiZjPTAwMA==)/bc/bsp/sap/text2kb.htm 63 GET /sap(bD1lbiZjPTAwMA==)/bc/bsp/sap/text4kb.htm 63 GET /sap(bD1lbiZjPTAwMA==)/bc/bsp/sap/image1kb.htm 63 GET /sap(bD1lbiZjPTAwMA==)/bc/bsp/sap/image2kb.htm 64 GET /sap(bD1lbiZjPTAwMA==)/bc/bsp/sap/image4kb.htm 64
The next step is to look at the results of each page test in detail.
Result Codes URI: GET /sap(bD1lbiZjPTAwMA==)/bc/bsp/sap/it03/text1KB.htm Code Description Count 200 OK 64 Time to first byte (in milliseconds) Average: 295.17 Min: 181.75 25th Percentile: 190.26 50th Percentile: 192.89 75th Percentile: 299.48 Max: 3405.89 Time to last byte (in milliseconds) Average: 295.31 Min: 181.86 25th Percentile: 190.37 50th Percentile: 193.01 75th Percentile: 299.60 Max: 3406.01 Downloaded Content Length (in bytes) Min: 1026 25th Percentile: 1026 50th Percentile: 1026 75th Percentile: 1026 Max: 1026
Specifically check that the result codes are all 200 OK, and check that the downloaded content length is stable over all tests. These are the first indicators of problems. Specifically if the content length varies over one test run, it can indicate that the page ran into a problem, and is suddenly rendering different output.
Once we are sure that the test run was acceptable, we look at the Time to First Byte (TTFB) and/or Time to Last Byte (TTLB) values. We know from the WebAS architecture that only complete HTTP responses are transmitted. So once transmission starts, the ABAP part of the processing is complete, and the difference between TTFB and TTLB only reflects network transmission time. TTFB is the more interesting value when looking at server performance (in which case the test machine and server should be close to one another on the same network). TTLB is the value to determine the HTTP roundtrip latency.
The last interesting aspect is the large difference between the average and 50th percentile values. This is due to a few out-risers that can dramatically influence the average value (specifically over a short test time). Because of this, we usually use only the 50th percentile values. (To be quite truthful, the fact that the 50th and 75th percentiles are not close to one another would usually prompt me to repeat the tests over a very long time. But for the educational value of this Weblog, these numbers are sufficient.)
Looking At All Tests Together
Here is a summary of all the tests.
|Test||TTFB (in ms)||TTLB (in ms)|
Probably the most important question is whether these numbers are good or bad. Looking at the absolute numbers, they are horrible in terms of what we have measured in optimal lab conditions. However, one must evaluate the numbers in context of the actual tests done. And here, the goal was to measure HTTP roundtrip latency. These tests were run from home, using a DSL connection onto the Internet, with (very expensive!) encrypted VPN tunneling. Furthermore, the tests were done against a development system, running a debug kernel (maybe 30 percent slower). So overall, we are happy with the numbers.
Interesting is that the values for TTFB and TTLB are very close for 1KB and 2KB sizes. However, for 4KB sizes we see a marked difference. This is attributed to the network and TCP properties as discussed before.
Looking at the difference between a1KB page versus an image, we see about 20ms. We know that after the warm-up phase, all images will be stored in the ICM cache, and willbe answered from there. Thus using the TTFB for the image as a rough indicator of the network latency (ICM processing at kernel level is definitely below 1ms!), we get an estimated 20ms processing time for the 1KB page. This is inline of our expectations for a debug kernel. (Estimates confirmed with measurements in the office: 50th percentile was 23ms.)
This Weblog has not attempted to put absolute perfect numbers on the table. Rather, a more realistic usage scenario was chosen. The first and most important goal was to show that it is relatively easy to make HTTP roundtrip latency measurements. Furthermore, the obtained numbers were slightly interpreted, just to give a feeling for how these numbers should be read.
Importance is that these are base line numbers that show the complete end-to-end latency for the specific scenario. This includes the actual transmission time, plus the processing time of the BSP runtime. Using a very small page, we can show that the BSP runtime itself is quite fast up to the point where the application coding is started. As a next step, one must measure a complete application. From there, it can be determined what part of the total time is attributed to the BSP runtime.
Be careful when interpreting single numbers to get a sizing number. It is important that a realistic scenario through the application be used, and that the results of the different URLs are used to calculate the true overhead of the server. Also consider the expected latency a user will experience in a similar network environment and the effects of MIME objects.