Skip to Content
Author's profile photo Anindya Bose

Cumulative Sum / Running Total in HANA Calculation View

Requirement :

Most of the popular reporting tools can handle cumulative sum or running total , but still there have been lot of posts in SCN about whether it is possible to do it in HANA itself.  So, just wanted to see , if it can be easily done via Graphical Calculation view and it seems to be working .

Conditional summation – If in Calculated column

Note: This might still be easier to do in Scripted View or Reporting Tool .  Scripted Calculation View is deprecated now, instead use Table Function .

In Figure 1, our base data is in Column 1 and 2 and we want to add Colum 3 ( “C UM_SUM” ) to existing data set. .

Capture.PNG

   Figure 1 : Base data and result

Detail Steps :

Figure 2 shows overall design of the calculation view.  It is pretty simple one, isn’t it ?

cap2.PNG

Figure 2 : Graphical Calculation View

a) I have two projections (  ALL_DATA and ALL_DATA2 )  of the base data ( first two columns in Figure 1 )

b) Created one Calculated Column “DUMMY” on each projection and assigned it a value 1 .  You can assign any other constant value .

c) In join node ( JOIN_1 ) ,(inner)  joined these two data sets via ‘DUMMY” column  .  After joining output would be like below

cap3.PNG

Figure 3: Output of Join node.

If you notice, for every month now I shall have 12 records  ( overall 12 X 12 = 144 ) .  Where my Month matches with Month1 , that is my actual value for that month . And I need to sum all other values where  Month >= Month1 .  So, for above example, sum of  10, 20 and 30 would give me cumulative sum for Month 3.

To do this, I defined two calculated attributes  , one to check if Month = Month1  and another to check If Month >= Month1  ( refer Figure 4 ) .

cap4.PNG

Figure 4: Calculated Columns ( Attributes) on Join_1 node


d) Now, I have two aggregation nodes. One with filter SAME_MONTH = “Y’ and another with GREATER_EQ_MONTH = ‘Y’ .

cap5.PNG

Figure 5 : Aggregation node ( C UM_SUM )  –

I took Month and Amount_1 and renamed Amount_1 to C UM_SUM .



cap6.PNG

Figure 6 : Aggregation_2


Took Month and Amount with a Filter SAME_MONTH = ‘Y’ .


e) Lastly , we need to union these two aggregation nodes.  Take Amount from one node and C UM_SUM from another node.


cap8.PNG

Figure 7 : Union Node


Result is in Figure 1 itself


Note: Model can be much more complex based on number of characteristics you take, in my case it was only Month . If you really want to go for this approach ( have enmity with script or coding 🙂   ) , do not use the base tables in JOIN_1 . Use this model after you aggregate data to a certain extent.  Obviously , you would like to avoid a join m X m where m is a large number .


Equivalent SQL Code would be as simple as below . We can create a Calculation View ( Scripted) or  Table Function using this SQL .

a )


select a."Month" ,a."Amount" , ( select sum(b."Amount") from "<SCHEMA>"."RUN_SUM" b  where a."Month" >= b."Month" ) as run_sum
from "<SCHEMA>"."RUN_SUM" a
order by a."Month"



Or you can use Window function for better performance

b)



select "Month" , "Amount" , Sum("Amount") over ( order by  "Month") from  "<SCHEMA>"."RUN_SUM"
order by "Month"

SQL in Section c running against a table with 1.3 billion records and I get the results in 40 micro seconds

c )


SELECT CALMONTH , ORDER_QUAN , SUM (ORDER_QUAN) OVER ( ORDER BY CALMONTH) AS SUM_VAL
FROM
( SELECT CALMONTH , SUM(ORDER_QUAN) AS "ORDER_QUAN" FROM "MY_TABLE"
  WHERE CALMONTH BETWEEN '201401' AND '201603'
  GROUP BY CALMONTH
)

Please share your feedback and improvement suggestions

Assigned Tags

      7 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Lars Breddemann
      Lars Breddemann

      Nice work!

      Although personally I think that you hit the nail on the head with this remark

      Note: This might still be easier to do in Scripted View or Reporting Tool .

      It's very likely also more efficient to simply use window functions in plain SQL or a table typed function (R.I.P. scripted calc views).

      Author's profile photo Anindya Bose
      Anindya Bose
      Blog Post Author

      Hi Lars

      Thanks for the feedback. It is good to get feedback from experts like you .

      Wanted to see if this can be done without a piece of code .  I shall add equivalent SQL code to achieve the same .

      Regards

      Anindya

      Author's profile photo Anindya Bose
      Anindya Bose
      Blog Post Author

      Hi Lars

      I see SQL Code and Graphical Calculation view both are creating  Nested Loop Join . Does system understand what I am trying to do and creating Calc Scenario intelligently ?

      Here is part of PlanViz . Please shed some light here .

      planViz.PNG

      Regards

      Anindya

      Author's profile photo Lars Breddemann
      Lars Breddemann

      From what I can see you decided to implement the cumulative sum in SQL in a very similar fashion then what you did in the calc. view.

      The scalar subquery gets rewritten into a join and that's what you get in PlanViz.

      A more efficient approach to ***. sum is to use window functions instead.

      Author's profile photo Former Member
      Former Member

      Hi Folks,

      I liked this approach for calculating the Cumulative sum which I am using in Hana but my concern is I am dealing with the millions of records whenever the Query is run it goes back to the 1 st record & start calculating the Sum based upon the Amount .

      Is there any approach by which I could  save my Sum as on today somewhere & then calculate the Sum from this point itself instead of going back to the 1st record

      Hope that clears what I am trying to implement

      Regards
      Rahul

      Author's profile photo Andrea Davoli
      Andrea Davoli

      Hi all,

      someone try to add more dimension (with some cut-off dimension) to the model and can help me?

       

      Andrea

       

      Author's profile photo Former Member
      Former Member

      Hi Andrea,

      have you or anyone tried with multiple dimensions? I could get them added using the windows function by grouping these dimensions, but it seems that if in that particular month, there are absolutely no data for a dimension value such as plant 1001, the existing cumulative value for previous month does not roll up to that future month.

      example:

      Jan 2017 plant 1000   Value : 500

      Jan 2017 plant 1001  Value : 200

      Feb 2017 plant 1001  Value : 150

       

      The result using the windows functions are giving me cumulative as below

      Jan 2017 Plant 1000 Value : 500

      Jan 2017 Plant 1001 Value : 200

      Feb 2017 Plant 1001 value 350

       

      I would expect another record for Feb 2017 where Plant 1000 = Value 500 being brough forward from previous month.

      The syntax i am using are example below

      SELECT "CalWeek", "Plant", "StorageLocation", "Value", SUM ("Value") OVER ( partition by "Plant", "StorageLocation" ORDER BY "CalWeek" ) AS "Cum_Value"
      FROM "Table"
      order by "CalWeek";

      appreciate your help.

      Eddie