Skip to Content
Author's profile photo Venkata Gudipati

Efficient running percentage bucketing calculations using HANA Window Functions

Objective:

The blog explains on how to find customers contributing to give percentage of sales efficiently using Window functions. Assume that Company ABCD is selling their products across all the regions and Sales Area leads wants to concentrate their energies on customers who are contributing to top 80% of the sales in each region. So, a simple report needed to provide the customer count for each region who are contributing to top 80% of sales

Assume that the raw sales data is as follows in the tables bellow.

The report need to display the number of customers contributing only to top 80% sales in each region as

Calculations:

Getting this kind of report using the conventional reporting tools is not as simple as it looks. As the volume goes up to tens of millions of records it is simply impossible to get the report with acceptable performance by business community.

HANA Window functions are very handy to perform these kinds of analytic operations over millions of records with much better performance, sometimes even in less than a second

Following are the step-by-step calculations involved to get the desired results

  1. Sort the customers in descending order with respect to their sales by each region
  2. calculate cumulative sum of Sales by customer for each region
  3. Calculate total Sales for each region
  4. Calculate running % of the sales by dividing the cumulative Sales by total sales for each region
  5. Bucket the customers into the Sales range as compared to the Total sales by each region
  6. Then filter the results by Sales Range to find the number of customer

Solution:

To calculate the cumulative sales percentage, I created a User defend Table function (successor to obsolete HANA Script Views) with custom SQL. The custom SQL in turn uses the Window Functions to perform the analytical operations

Here is the Table Function definition.

The following statement in the select clause calculates the running sum of Sales Amount by customer for each region (partition by region)

The following Statement in the SQL function calculates the total Sales Amount for each region.

These two statement brings cumulative Sales Amount by customer in each regions and Total Sales for each region in the same row for each record in the raw data to do further calculations easily at row level.

The PRECEDING keyword indicates the number of rows or values to precede the current row. UNBOUND PRECEDING specifies that the window starts at the first row of the partition and end at the current row.

 

The FOLLOWING clause indicates the number of rows or values to follow the current row. UNBOUNDED FOLLOWING specifies that the window starts a row following the current row and ends at the last row of the current partition.

Combining key words UNBOUND PRECEEDING and UNBOUND FOLLOWING with BETWEEN operator takes all the records in the partition into account to aggregate the measures.

Further calculations are done in simple Calculation View by creating a projection with Table Function as a data source .

Running Percentage calculation.

Percentage Range bucketing.

Following the final report with number customers in each region contributing to top 80% of the Sales

Assigned Tags

      8 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Maksim Alyapyshev
      Maksim Alyapyshev

      Thank you for sharing, useful thing! Also the post is wellstructured and writed with a good style!

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Good stuff, thanks for sharing!

      Author's profile photo Mayank Jaiswal
      Mayank Jaiswal

      Thanks for sharing, nicely explained.

      Can you please let me know if this works fine when the number of characteristics are more (For E.g.  if one wants to do analysis at region/country/sales area wise.. can you give an example how it would be done in that case).

      Regards,

      Mayank

      Author's profile photo Venkata Gudipati
      Venkata Gudipati
      Blog Post Author

       

      Thank you all for the comments.

      Author's profile photo Venkata Gudipati
      Venkata Gudipati
      Blog Post Author

      Mayank,

       

      Yes, we can have more dimensions for partitions. Here is an example.

      Select .....

      Sum(SPEND) OVER (PARTITION BY CALQUARTER ,REGION, SLAESAREA ORDER BY SALES_AMT DESC )as "RUN_SPEND",
      Sum(SPEND) OVER (PARTITION BY CALQUARTER ,REGION,SLAESAREA ORDER BY SPEND DESC rows between unbounded preceding and unbounded following) as "TOT_SALES"

      Author's profile photo Venkata Gudipati
      Venkata Gudipati
      Blog Post Author

       

      Mayank,

       

      Yes, we can have more dimensions for partitions. Here is an example.

      Select …..

      Sum(SPEND) OVER (PARTITION BY CALQUARTER ,REGION, SLAESAREA ORDER BY SALES_AMT DESC )as “RUN_SALES”,
      Sum(SPEND) OVER (PARTITION BY CALQUARTER ,REGION,SLAESAREA ORDER BY SALES_AMT DESC rows between unbounded preceding and unbounded following) as “TOT_SALES”

      Author's profile photo Ramachandra Kudamala
      Ramachandra Kudamala

      Good Stuff. Thanks Venkat.

      Author's profile photo Former Member
      Former Member

      Nice Explanation, Thanks for sharing.