Chop your logs!
If there is a first thing to check on a production server it must be the log configuration.
You will be amazed to know how just a few logs in INFO severity can cause your server to slow down (besides filling-up your valuable disk space). In the portal code there are many calls for writing log messages in different locations, it allows you to instantly collect info when you are in a problem. BUT, the appropriate severity setting for a production system should be “ERROR”. If you don’t have specific issue there is no need for the server to collect data for nothing. What’s more – you can change the severity on the fly at any given time.
The way to change the severity levels is via the AdminTool, under server -> services -> Log Configurator. Make sure that “ERROR” is defined for both “Categories” and “Locations” tabs and use the “Copy severity to subtree” button (See below). This is important since there may be some “INFO” hiding in lower levels. Don’t forget to save your change and make sure you use the “apply to all server nodes” option.
I ran a 3 hours test to get some numbers for you, my base line was navigations test where all logs are in “ERROR” and then I just set the “com.sap.portal.prt” location to “INFO”. Before reading next, care to guess the results?
The CPU level for the server jumped by 20% (absolute number), response times were doubled, number of full GC events doubled and the log files were hundreds of MB is size… makes you think, isn’t it?
Some techi stuff – The log method calls are creating complex String objects causing memory consumption and file system I/O when writing them to disk. They are designed to be completely ignored if the corresponding log severity is not active, so they won’t add overhead to the server.
If you do run into a problem and need to change the log severity, try the following:
1. Activate only the locations you need – setting everything to INFO will collect huge amounts of data and make it very hard to get a clear picture from the logs.
2. If you want to reproduce the problem on production environment try to do it on an isolated server node (off the load balancer), this way you ensure that all collected data is relevant to the scenario you are running and no other users will be affected.
3. Document your changes and time-limit them – don’t forget to set the log severity back to ERROR when you are done.
Until next time,