Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 

Web analytics and reporting is an growing business in the recent times were data clouds are rising on the horizon. Knowing who is doing what in your cloud system is important to understand for the business.

At least basic web site analytics and reporting is a feature you get from SAP HANA in a few steps. All you need is having access to your cloud system's logging capabilities and modifying them in terms of having SAP HANA compliant log files.

Here you will be shown an example based on an Apache Webserver log file. Having a custom format defined according to easy consumable SAP HANA import files.

Prerequisites

To get this scenario running, you need at least

  • an Apache Webserver with the mod_logio enabled
  • a SAP HANA instance (T-Shirt size XS is sufficient)
  • a SAP HANA Developer Studio
  • the SAP HANA Client Tools (hdbsql command line tool)

The Apache Webserver needs to output its log files into a common directory accessible by the SAP HANA appliance. Access rights should be set according to the user executing the data import.

The SAP HANA Developer Studio and the SAP HANA Client tools should be installed on a PC which could connect to your SAP HANA instance by Port 30015 (JDBC).

Step 1: Setup a SAP HANA consumeable Logformat for the Apache configuration

Apache describes its log format in the configuration file httpd.conf (or equivalant filename in clustered installations). Since SAP HANA is able to import CSV files directly by SQL functions you can define the format as follows:

LogFormat "\"%h\",\"%{end:%Y-%m-%d %T}t\",\"%r\",%>s,%I,%O" hana_format

CustomLog logs/access_log hana_format

For a full log format description visit the Apache Webserver documentation. The configuration above will result in the following log file output in your Apache log directory:

"p57a7249e.dip.t-dialin.net","2012-12-05 19:04:21","GET /index.html HTTP/1.1",200,12676,234

"87.12.33.233","2012-12-05 19:04:23","GET /index.html HTTP/1.1",200,12676,234

"87.12.33.233","2012-12-05 19:04:23","GET /navi.html HTTP/1.1",200,23345,525

Step 2: Prepare the HANA DB

To import the CSV file into HANA we prepare the HANA DB by creating a new schema and a table following the Apache log format definition. Open the SAP HANA Developer Studio and navigate to the Modeler perspective. Open the SQL editor view and execute the following SQL statements to create a new schema and a table which accepts the log file data later:

create schema "TEST";

create column table TEST.LOG (client VARCHAR(64), logdate TIMESTAMP, request VARCHAR(255), status INTEGER, received BIGINT, sent BIGINT);

After SQL execution your HANA DB should now contain a new schema TEST including an empty table named LOG.

Step 3: Import Logfile into HANA

To get the log file data into your SAP HANA you can use the SQL command IMPORT FROM. This command easily consumes CSV files and imports the data into an existing HANA DB table.

Use the hdbsql command line tool from SAP HANA Clients tools to execute the command. You should keep track of the log files access right. At least your executing user should be able to read the file locally.

hana1> hdbsql -p manager -u SYSTEM "IMPORT FROM CSV FILE '/var/apache/logs/access_log' INTO TEST.LOG WITH THREADS 4 OPTIONALLY ENCLOSED BY '\"'"

For full reference on the IMPORT FROM SQL command please refers the the SAP HANA reference guide.

The result of the import could be viewed in the SAP HANA Developer Studio. So you will find the imported data in your HANA DB table TEST.LOG.


What is this exerice for?

Having the log file data in SAP HANA you could now start creating Analytic Views or Calculation Views on top of this clickstream data. These can be consumed by any analytic application, e.g. with SAP BusinessObjects tools.

Questions which could be answered by such Views are  for example

  • How much data is transfered from/to my data cloud / web application
  • How much data is received from
    • a specific country
    • an specific IP address/range
    • a specific user agent (browser/bot/crawler etc.)

Just to mention a few of a huge set of analytics and reporting KPIs.