Skip to Content

Requirement :

Most of the popular reporting tools can handle cumulative sum or running total , but still there have been lot of posts in SCN about whether it is possible to do it in HANA itself.  So, just wanted to see , if it can be easily done via Graphical Calculation view and it seems to be working .

Conditional summation – If in Calculated column

Note: This might still be easier to do in Scripted View or Reporting Tool .  Scripted Calculation View is deprecated now, instead use Table Function .

In Figure 1, our base data is in Column 1 and 2 and we want to add Colum 3 ( “C UM_SUM” ) to existing data set. .

Capture.PNG

   Figure 1 : Base data and result

Detail Steps :

Figure 2 shows overall design of the calculation view.  It is pretty simple one, isn’t it ?

cap2.PNG

Figure 2 : Graphical Calculation View

a) I have two projections (  ALL_DATA and ALL_DATA2 )  of the base data ( first two columns in Figure 1 )

b) Created one Calculated Column “DUMMY” on each projection and assigned it a value 1 .  You can assign any other constant value .

c) In join node ( JOIN_1 ) ,(inner)  joined these two data sets via ‘DUMMY” column  .  After joining output would be like below

cap3.PNG

Figure 3: Output of Join node.

If you notice, for every month now I shall have 12 records  ( overall 12 X 12 = 144 ) .  Where my Month matches with Month1 , that is my actual value for that month . And I need to sum all other values where  Month >= Month1 .  So, for above example, sum of  10, 20 and 30 would give me cumulative sum for Month 3.

To do this, I defined two calculated attributes  , one to check if Month = Month1  and another to check If Month >= Month1  ( refer Figure 4 ) .

cap4.PNG

Figure 4: Calculated Columns ( Attributes) on Join_1 node


d) Now, I have two aggregation nodes. One with filter SAME_MONTH = “Y’ and another with GREATER_EQ_MONTH = ‘Y’ .

cap5.PNG

Figure 5 : Aggregation node ( C UM_SUM )  –

I took Month and Amount_1 and renamed Amount_1 to C UM_SUM .



cap6.PNG

Figure 6 : Aggregation_2


Took Month and Amount with a Filter SAME_MONTH = ‘Y’ .


e) Lastly , we need to union these two aggregation nodes.  Take Amount from one node and C UM_SUM from another node.


cap8.PNG

Figure 7 : Union Node


Result is in Figure 1 itself


Note: Model can be much more complex based on number of characteristics you take, in my case it was only Month . If you really want to go for this approach ( have enmity with script or coding 🙂   ) , do not use the base tables in JOIN_1 . Use this model after you aggregate data to a certain extent.  Obviously , you would like to avoid a join m X m where m is a large number .


Equivalent SQL Code would be as simple as below . We can create a Calculation View ( Scripted) or  Table Function using this SQL .

a )


select a."Month" ,a."Amount" , ( select sum(b."Amount") from "<SCHEMA>"."RUN_SUM" b  where a."Month" >= b."Month" ) as run_sum
from "<SCHEMA>"."RUN_SUM" a
order by a."Month"



Or you can use Window function for better performance

b)



select "Month" , "Amount" , Sum("Amount") over ( order by  "Month") from  "<SCHEMA>"."RUN_SUM"
order by "Month"

SQL in Section c running against a table with 1.3 billion records and I get the results in 40 micro seconds

c )


SELECT CALMONTH , ORDER_QUAN , SUM (ORDER_QUAN) OVER ( ORDER BY CALMONTH) AS SUM_VAL
FROM
( SELECT CALMONTH , SUM(ORDER_QUAN) AS "ORDER_QUAN" FROM "MY_TABLE"
  WHERE CALMONTH BETWEEN '201401' AND '201603'
  GROUP BY CALMONTH
)

Please share your feedback and improvement suggestions

To report this post you need to login first.

7 Comments

You must be Logged on to comment or reply to a post.

  1. Lars Breddemann

    Nice work!

    Although personally I think that you hit the nail on the head with this remark

    Note: This might still be easier to do in Scripted View or Reporting Tool .

    It’s very likely also more efficient to simply use window functions in plain SQL or a table typed function (R.I.P. scripted calc views).

    (0) 
    1. Anindya Bose Post author

      Hi Lars

      Thanks for the feedback. It is good to get feedback from experts like you .

      Wanted to see if this can be done without a piece of code .  I shall add equivalent SQL code to achieve the same .

      Regards

      Anindya

      (0) 
  2. Anindya Bose Post author

    Hi Lars

    I see SQL Code and Graphical Calculation view both are creating  Nested Loop Join . Does system understand what I am trying to do and creating Calc Scenario intelligently ?

    Here is part of PlanViz . Please shed some light here .

    planViz.PNG

    Regards

    Anindya

    (0) 
    1. Lars Breddemann

      From what I can see you decided to implement the cumulative sum in SQL in a very similar fashion then what you did in the calc. view.

      The scalar subquery gets rewritten into a join and that’s what you get in PlanViz.

      A more efficient approach to ***. sum is to use window functions instead.

      (0) 
  3. RAHUL KHANNA

    Hi Folks,

    I liked this approach for calculating the Cumulative sum which I am using in Hana but my concern is I am dealing with the millions of records whenever the Query is run it goes back to the 1 st record & start calculating the Sum based upon the Amount .

    Is there any approach by which I could  save my Sum as on today somewhere & then calculate the Sum from this point itself instead of going back to the 1st record

    Hope that clears what I am trying to implement

    Regards
    Rahul

    (0) 
    1. Eddie Ng

      Hi Andrea,

      have you or anyone tried with multiple dimensions? I could get them added using the windows function by grouping these dimensions, but it seems that if in that particular month, there are absolutely no data for a dimension value such as plant 1001, the existing cumulative value for previous month does not roll up to that future month.

      example:

      Jan 2017 plant 1000   Value : 500

      Jan 2017 plant 1001  Value : 200

      Feb 2017 plant 1001  Value : 150

       

      The result using the windows functions are giving me cumulative as below

      Jan 2017 Plant 1000 Value : 500

      Jan 2017 Plant 1001 Value : 200

      Feb 2017 Plant 1001 value 350

       

      I would expect another record for Feb 2017 where Plant 1000 = Value 500 being brough forward from previous month.

      The syntax i am using are example below

      SELECT “CalWeek”, “Plant”, “StorageLocation”, “Value”, SUM (“Value”) OVER ( partition by “Plant”, “StorageLocation” ORDER BY “CalWeek” ) AS “Cum_Value”
      FROM “Table”
      order by “CalWeek”;

      appreciate your help.

      Eddie

      (0) 

Leave a Reply