Using web log analysis tools on SUP
I should start this blog with the disclaimer that SUP does not claim to support any web log analysis tool, neither the attached code or the described methodology is recommended to be used under any productive scenario. This blog is an output of a self sponsored exercise of playing around with log files and trying to extract presentable information out of them.
A Web log analysis tool is a kind of web analytics software that can parse logs present on a web server and generate reports from them. Once parsed the software might generate reports immediately or store the parsed data in a DB for later use. There are many such tools available in the market (paid and free). The two I played around with are WebLog Expert and Webalizer.
These reports are instrumental in presenting the popularity and demand of a website or the products and services its presenting. Its a delight for a webAdmin to see how many hits his site is getting and how many unique visitors are responsible for that. It can even help the Admin to see how many request failed and what were the reasons (http error codes) for that. He will in a glance come to know which days of the week or which hours in a day his site gets most hits. Such information can sometimes dictate IT related decisions for an Admin.
The other beneficiary of such reports is the business with which the website is associated with. If a retail giant gives out special discounts for a season then these reports can indicate how well such an offer has been taken up in the customer base. Has the hits increased after the offer going public? Or suppose a new product is going to get launched in this spring, the reports can tell the company how many unique visitors are checking out the page where the product is advertised and thus this gives an indication of the interest in the customer base. So a lot is and can be done with such reports.
Some of the things the reports show are
- Number of visits and number of unique visitors
- Most viewed pages
- Traffic per Day of week, month and rush hours
- Domains/countries of host’s visitors.
- Browsers used
- HTTP errors
Have a look at a sample report generated by one of the tools: http://www.weblogexpert.com/sample/index.htm
Now the question is what has a web log analysis tool got to do with SUP? With the release of REST Channel which favors creation of online applications the nature of the end user itself is changing. Our customers can create/release applications to be used by their customers and this multiplies the user base considerably. In such a scenario where the end user base is actually the general people who are interested or loyal to one business or retail chain such web analytics style reports can be of considerable interest.
Below are the steps showing how to get such reports on SUP.
SUP does have logs which can be used for such report generation but they need minor tinkering. The Web log analysis tools need logs in NCSA format
Client_IP_address -User_Name [Date:Time -TimeZone] “Method Object HTTP_version” HTTP_StatusCode BytesSent
10.5.55.192 – anonymous@anonymous [07/May/2013:14:39:37.35 -0900] “GET /public/test/ HTTP/1.1” 200 431 “-” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0”
The time stamp in the SUP http logs look like [2013-05-08 14:23:39.905] so this needs to be changed to NCSA format. Either you can write your own script to do so or compile and run the attached java code.
To run the attached java code do the following.
Copy the logs, with ‘http’ in their names, present in C:\Sybase\UnwiredPlatform\Servers\UnwiredServer\logs\old to a folder on your system(Ex: C:\WebLogs). Also copy <machinename>-http.log from C:\Sybase\UnwiredPlatform\Servers\UnwiredServer\logs to the same folder on your system(Ex: C:\WebLogs).
Copy the attached java file to your system (Ex: C:\code), Now compile the attached java code, to compile it open cmd set the path to the jdk and compile. Then run the code.
C:\code>path C:\Program Files\Java\jdk1.6.0_34\bin
C:\code>java ChangeLogs C:\WebLogs -0500 C:\code\out.log
A Few Things to Note:
C:\WebLogs is the folder in which multiple log files are present, you can also give the path of one particular file (Ex: C:\WebLogs\<machinename>-http.log). But the log file should have “http” in its name, else it will be ignored.
-0500 is the time zone information, Greenwich mean time (GMT) offset
C:\code\out.log is the output file.
Now you can use the out.log to generate report with any tool you choose.