# Efficient running percentage bucketing calculations using HANA Window Functions

**Objective: **

The blog explains on how to find customers contributing to give percentage of sales efficiently using Window functions. Assume that Company ABCD is selling their products across all the regions and Sales Area leads wants to concentrate their energies on customers who are contributing to top 80% of the sales in each region. So, a simple report needed to provide the customer count for each region who are contributing to top 80% of sales

Assume that the raw sales data is as follows in the tables bellow.

The report need to display the number of customers contributing only to top 80% sales in each region as

**Calculations:**

Getting this kind of report using the conventional reporting tools is not as simple as it looks. As the volume goes up to tens of millions of records it is simply impossible to get the report with acceptable performance by business community.

HANA Window functions are very handy to perform these kinds of analytic operations over millions of records with much better performance, sometimes even in less than a second

Following are the step-by-step calculations involved to get the desired results

- Sort the customers in descending order with respect to their sales by each region
- calculate cumulative sum of Sales by customer for each region
- Calculate total Sales for each region
- Calculate running % of the sales by dividing the cumulative Sales by total sales for each region
- Bucket the customers into the Sales range as compared to the Total sales by each region
- Then filter the results by Sales Range to find the number of customer

**Solution:**

To calculate the cumulative sales percentage, I created a User defend Table function (successor to obsolete HANA Script Views) with custom SQL. The custom SQL in turn uses the Window Functions to perform the analytical operations

Here is the Table Function definition.

The following statement in the select clause calculates the running sum of Sales Amount by customer for each region (partition by region)

The following Statement in the SQL function calculates the total Sales Amount for each region.

These two statement brings cumulative Sales Amount by customer in each regions and Total Sales for each region in the same row for each record in the raw data to do further calculations easily at row level.

The *PRECEDING* keyword indicates the number of rows or values to precede the current row. *UNBOUND PRECEDING* specifies that the window starts at the first row of the partition and end at the current row.

The *FOLLOWING* clause indicates the number of rows or values to follow the current row. *UNBOUNDED FOLLOWING* specifies that the window starts a row following the current row and ends at the last row of the current partition.

Combining key words *UNBOUND PRECEEDING* and *UNBOUND FOLLOWING* with *BETWEEN* operator takes all the records in the partition into account to aggregate the measures.

Further calculations are done in simple Calculation View by creating a projection with Table Function as a data source .

Running Percentage calculation.

Percentage Range bucketing.

Following the final report with number customers in each region contributing to top 80% of the Sales

Thank you for sharing, useful thing! Also the post is wellstructured and writed with a good style!

Good stuff, thanks for sharing!

Thanks for sharing, nicely explained.

Can you please let me know if this works fine when the number of characteristics are more (For E.g. if one wants to do analysis at region/country/sales area wise.. can you give an example how it would be done in that case).

Regards,

Mayank

Thank you all for the comments.

Mayank,

Yes, we can have more dimensions for partitions. Here is an example.

Select .....

Sum(SPEND) OVER (PARTITION BY CALQUARTER ,REGION, SLAESAREA ORDER BY SALES_AMT DESC )as "RUN_SPEND",

Sum(SPEND) OVER (PARTITION BY CALQUARTER ,REGION,SLAESAREA ORDER BY SPEND DESC rows between unbounded preceding and unbounded following) as "TOT_SALES"

Mayank,

Yes, we can have more dimensions for partitions. Here is an example.

Select …..

Sum(SPEND) OVER (PARTITION BY CALQUARTER ,REGION, SLAESAREA ORDER BY SALES_AMT DESC )as “RUN_SALES”,

Sum(SPEND) OVER (PARTITION BY CALQUARTER ,REGION,SLAESAREA ORDER BY SALES_AMT DESC rows between unbounded preceding and unbounded following) as “TOT_SALES”

Good Stuff. Thanks Venkat.

Nice Explanation, Thanks for sharing.