Cumulative Sum / Running Total in HANA Calculation View
Requirement :
Most of the popular reporting tools can handle cumulative sum or running total , but still there have been lot of posts in SCN about whether it is possible to do it in HANA itself. So, just wanted to see , if it can be easily done via Graphical Calculation view and it seems to be working .
Note: This might still be easier to do in Scripted View or Reporting Tool . Scripted Calculation View is deprecated now, instead use Table Function .
In Figure 1, our base data is in Column 1 and 2 and we want to add Colum 3 ( “C UM_SUM” ) to existing data set. .
Figure 1 : Base data and result
Detail Steps :
Figure 2 shows overall design of the calculation view. It is pretty simple one, isn’t it ?
Figure 2 : Graphical Calculation View
a) I have two projections ( ALL_DATA and ALL_DATA2 ) of the base data ( first two columns in Figure 1 )
b) Created one Calculated Column “DUMMY” on each projection and assigned it a value 1 . You can assign any other constant value .
c) In join node ( JOIN_1 ) ,(inner) joined these two data sets via ‘DUMMY” column . After joining output would be like below
Figure 3: Output of Join node.
If you notice, for every month now I shall have 12 records ( overall 12 X 12 = 144 ) . Where my Month matches with Month1 , that is my actual value for that month . And I need to sum all other values where Month >= Month1 . So, for above example, sum of 10, 20 and 30 would give me cumulative sum for Month 3.
To do this, I defined two calculated attributes , one to check if Month = Month1 and another to check If Month >= Month1 ( refer Figure 4 ) .
Figure 4: Calculated Columns ( Attributes) on Join_1 node
d) Now, I have two aggregation nodes. One with filter SAME_MONTH = “Y’ and another with GREATER_EQ_MONTH = ‘Y’ .
Figure 5 : Aggregation node ( C UM_SUM ) –
I took Month and Amount_1 and renamed Amount_1 to C UM_SUM .
Figure 6 : Aggregation_2
Took Month and Amount with a Filter SAME_MONTH = ‘Y’ .
e) Lastly , we need to union these two aggregation nodes. Take Amount from one node and C UM_SUM from another node.
Figure 7 : Union Node
Result is in Figure 1 itself
Note: Model can be much more complex based on number of characteristics you take, in my case it was only Month . If you really want to go for this approach ( have enmity with script or coding 🙂 ) , do not use the base tables in JOIN_1 . Use this model after you aggregate data to a certain extent. Obviously , you would like to avoid a join m X m where m is a large number .
Equivalent SQL Code would be as simple as below . We can create a Calculation View ( Scripted) or Table Function using this SQL .
a )
select a."Month" ,a."Amount" , ( select sum(b."Amount") from "<SCHEMA>"."RUN_SUM" b where a."Month" >= b."Month" ) as run_sum
from "<SCHEMA>"."RUN_SUM" a
order by a."Month"
Or you can use Window function for better performance
b)
select "Month" , "Amount" , Sum("Amount") over ( order by "Month") from "<SCHEMA>"."RUN_SUM"
order by "Month"
SQL in Section c running against a table with 1.3 billion records and I get the results in 40 micro seconds
c )
SELECT CALMONTH , ORDER_QUAN , SUM (ORDER_QUAN) OVER ( ORDER BY CALMONTH) AS SUM_VAL
FROM
( SELECT CALMONTH , SUM(ORDER_QUAN) AS "ORDER_QUAN" FROM "MY_TABLE"
WHERE CALMONTH BETWEEN '201401' AND '201603'
GROUP BY CALMONTH
)
Please share your feedback and improvement suggestions
Nice work!
Although personally I think that you hit the nail on the head with this remark
It's very likely also more efficient to simply use window functions in plain SQL or a table typed function (R.I.P. scripted calc views).
Hi Lars
Thanks for the feedback. It is good to get feedback from experts like you .
Wanted to see if this can be done without a piece of code . I shall add equivalent SQL code to achieve the same .
Regards
Anindya
Hi Lars
I see SQL Code and Graphical Calculation view both are creating Nested Loop Join . Does system understand what I am trying to do and creating Calc Scenario intelligently ?
Here is part of PlanViz . Please shed some light here .
Regards
Anindya
From what I can see you decided to implement the cumulative sum in SQL in a very similar fashion then what you did in the calc. view.
The scalar subquery gets rewritten into a join and that's what you get in PlanViz.
A more efficient approach to ***. sum is to use window functions instead.
Hi Folks,
I liked this approach for calculating the Cumulative sum which I am using in Hana but my concern is I am dealing with the millions of records whenever the Query is run it goes back to the 1 st record & start calculating the Sum based upon the Amount .
Is there any approach by which I could  save my Sum as on today somewhere & then calculate the Sum from this point itself instead of going back to the 1st record
Hope that clears what I am trying to implement
Regards
Rahul
Hi all,
someone try to add more dimension (with some cut-off dimension) to the model and can help me?
Andrea
Hi Andrea,
have you or anyone tried with multiple dimensions? I could get them added using the windows function by grouping these dimensions, but it seems that if in that particular month, there are absolutely no data for a dimension value such as plant 1001, the existing cumulative value for previous month does not roll up to that future month.
example:
Jan 2017 plant 1000 Â Value : 500
Jan 2017 plant 1001 Â Value : 200
Feb 2017 plant 1001 Â Value : 150
The result using the windows functions are giving me cumulative as below
Jan 2017 Plant 1000 Value : 500
Jan 2017 Plant 1001 Value : 200
Feb 2017 Plant 1001 value 350
I would expect another record for Feb 2017 where Plant 1000 = Value 500 being brough forward from previous month.
The syntax i am using are example below
SELECT "CalWeek", "Plant", "StorageLocation", "Value", SUM ("Value") OVER ( partition by "Plant", "StorageLocation" ORDER BY "CalWeek" ) AS "Cum_Value"
FROM "Table"
order by "CalWeek";
appreciate your help.
Eddie
Hi ,
I am new to SAP HANA Calculation view and I modified the design a bit and this is also working.
2) Load the same table in two different Projections Projection_1 and Projection_2
3) Do inner join between these projections based on the DUMMY variable value(DUMMY value can be any constant value must be same for all the months), after the INNER JOIN the design and data looks likes below,
4) In Projection_3 create a Filter FILTER_GREATER and enter the formula like this
if("MONTH_1"<="MONTH",'Y','N')
5) In the Aggregation_2 node set the Expression as below,
Also In the Aggregation node apply the aggregation function to AMOUNT1 which will store the cumulative sum of Amount
6) In the Final Projection, rename the AMOUNT1 as Cumulative_AMOUNT
7) The final value looks like this,
Note :