Efficient running percentage bucketing calculations using HANA Window Functions
The blog explains on how to find customers contributing to give percentage of sales efficiently using Window functions. Assume that Company ABCD is selling their products across all the regions and Sales Area leads wants to concentrate their energies on customers who are contributing to top 80% of the sales in each region. So, a simple report needed to provide the customer count for each region who are contributing to top 80% of sales
Assume that the raw sales data is as follows in the tables bellow.
The report need to display the number of customers contributing only to top 80% sales in each region as
Getting this kind of report using the conventional reporting tools is not as simple as it looks. As the volume goes up to tens of millions of records it is simply impossible to get the report with acceptable performance by business community.
HANA Window functions are very handy to perform these kinds of analytic operations over millions of records with much better performance, sometimes even in less than a second
Following are the step-by-step calculations involved to get the desired results
- Sort the customers in descending order with respect to their sales by each region
- calculate cumulative sum of Sales by customer for each region
- Calculate total Sales for each region
- Calculate running % of the sales by dividing the cumulative Sales by total sales for each region
- Bucket the customers into the Sales range as compared to the Total sales by each region
- Then filter the results by Sales Range to find the number of customer
To calculate the cumulative sales percentage, I created a User defend Table function (successor to obsolete HANA Script Views) with custom SQL. The custom SQL in turn uses the Window Functions to perform the analytical operations
Here is the Table Function definition.
The following statement in the select clause calculates the running sum of Sales Amount by customer for each region (partition by region)
The following Statement in the SQL function calculates the total Sales Amount for each region.
These two statement brings cumulative Sales Amount by customer in each regions and Total Sales for each region in the same row for each record in the raw data to do further calculations easily at row level.
The PRECEDING keyword indicates the number of rows or values to precede the current row. UNBOUND PRECEDING specifies that the window starts at the first row of the partition and end at the current row.
The FOLLOWING clause indicates the number of rows or values to follow the current row. UNBOUNDED FOLLOWING specifies that the window starts a row following the current row and ends at the last row of the current partition.
Combining key words UNBOUND PRECEEDING and UNBOUND FOLLOWING with BETWEEN operator takes all the records in the partition into account to aggregate the measures.
Further calculations are done in simple Calculation View by creating a projection with Table Function as a data source .
Running Percentage calculation.
Percentage Range bucketing.
Following the final report with number customers in each region contributing to top 80% of the Sales