Requirement

It is a very common requirement to view various “slices” of data based on different time criteria on the same row in a report or analysis. “Show me current year to date vs. prior year to date sales figures for all sales orgs”. This can be accomplished in a number of ways, many of which have been explored in the following blog posts.

Implementation of WTD, MTD, YTD in HANA using Input Parameters only

Implementation of WTD, MTD, YTD Period Reporting in HANA using Calculated Columns in Projection

Implementation of WTD, MTD, YTD in HANA using Script based Calc View calling Graphical Calc view

Applying YTD in SAP HANA with SAP BO Analysis Office

All of these approaches have one thing in common – they all rely on generation of the time selection criteria at runtime. This is accomplished through either an input parameter/variable from a user or dynamically determining it at runtime of the view in the case of the scripted view.

While effective, this is not really reusable in the sense of a “conformed” DW dimension. Developers may be interpreting the definitions differently across many of the views that start springing up in the system. For example, with “Current Year To Date”, does this include all data leading up to the current day or does it only include data up to the last closed month? Or maybe the last closed week? Two developers may interpret the same semantics slightly differently and users (and developers) may become confused.


Process Approach Overview

Why don’t we predetermine the time slice definitions as attributes physically in the data so everyone can use them?!

I would draw a similarity with cooking popcorn in the microwave. Now, I could read the instructions and manually choose how long I should cook my popcorn OR I can press one button and have that time determined for me without having to provide anything. This is one of key points (for me at least) in BI application design – don’t make your users have to think too hard to get the answers they need. Now, I never said your popcorn won’t burn, and I’d like to think the HANA approach is a *little* more precise, but you get the idea of the ‘easy’ button I am trying to describe here.

In many of the previous DW/BI projects I have been involved in, the date dimensions play a large role in helping to solve this type of “time slice” requirement. Here, a given member of the dimension can have many different flags or indicators present, which when joined to the fact table, can act as filters on your data.

This would replace the need to determine the filters at run time since they are already available to you for use in joins.

The added benefits are:

  • These rules or definitions will be reused across all solutions, therefore imposing some level of data consistency in the DW and end users or content creators can get a consistent experience.
  • By making these rules accessible in a dimension, the options for solutions become more flexible as we can now model directly in the analytic view without the need for a Calculation view.
  • We never have to be concerned about ‘filter pushdown’ since we are relying on the joins in the Analytic views to work for us at the lowest level possible.
  • You can introduce as many different filters as you need to, YTD, MTD, QTD, WTD including fiscal definitions if required. You just need some logic and a new column on your target dimension.

I have to give credit to a colleague of mine Greg Cantwell who put together the production version that this demonstration is built off of.


Happy HANA!

Justin


HANA Development

This solution consists of the following components

  1. Date dimension table based a given granularity (day, month, fiscal period, etc) of your choice.  In my demo I will be using the monthly grain. HANA provides functions to create time data, for which you can generate and then either modify tables directly or move to another schema. See Appendix 1 for details.
  2. A fact table containing the grain chosen in step 1, monthly in this example.
  3. A SQL script (or other method) that can update the date dimension table with the “current state” per business definitions. This would execute at some predefined schedule to ensure that the dimension “moves up” as time changes.
  4. An attribute view that wraps the table from step 1.
  5. An Analytic view that wraps the table from step 1 and includes the attribute view in step 4. Here I will also illustrate how the “time slices” can be baked into the analytic view without using a Calculation view.
  6. A graphical Calculation view that consumes the analytic view.
  7. Performance testing in Appendix 2 for Calc View vs. Analytic View.

Date dimension table. See attached file for insert statements.

CREATE COLUMN TABLE “MOLJUS02”.“MONTH_DIM”

    (“YEAR” NVARCHAR(4),

        “YEAR_MONTH” NVARCHAR(6),

        “QUARTER” NVARCHAR(2),

        “MONTH” NVARCHAR(2),

        “YTD” NVARCHAR(1),

        “CY” NVARCHAR(1),

        “PY” NVARCHAR(1),

        PRIMARY KEY (“YEAR”,

        “MONTH”)) UNLOAD PRIORITY 0 AUTO MERGE;

A fact table containing the grain chosen in step 1, monthly in this example.  See attached file for insert statements.

CREATE COLUMN TABLE “MOLJUS02”.“FACT_TABLE”

    (“YEARMONTH” NVARCHAR(6),

        “MATERIAL” NVARCHAR(18),

        “CUSTOMER” NVARCHAR(10),

        “SALES” DECIMAL(15,2),

        “COST” DECIMAL(15,2),

        PRIMARY KEY (“YEARMONTH”,

        “MATERIAL”, “CUSTOMER”)) UNLOAD PRIORITY 0 AUTO MERGE;

A SQL script (or other method) that can update the date dimension table with the “current state” per definitions. Here I will just show the SQL, you can choose how to implement.

UPDATE “MOLJUS02”.“MONTH_DIM” SET

“YTD” = case when month(current_date) >= “MONTH”

       then ‘Y’ else ‘N’ end,

       “CY” = case when year(current_date) = “YEAR”

       then ‘Y’ else ‘N’ end,

       “PY” = case when year(current_date)-1 = “YEAR”

then ‘Y’ else ‘N’ end;

For each record in the date dimension, we are using three different attributes to help us determine the time slices.

Column “YTD” (Y/N) tells us whether the month falls within the months that have already passed in the year, INCLUDING the current month. So if we are in March (03), this will flag months 01, 02 and 03 across ALL years that exist in the dimension.

Column “CY” (Y/N) tells us whether the month falls within the current year.

Column “PY” (Y/N) tells us whether the month falls within the previous year.

An attribute view that wraps the table from step 1.

An Analytic view that wraps the table from step 1 and includes the attribute view in step 4. Here I will also illustrate how the “time slices” can be baked into the analytic view without using a Calculation view.


The join is on YEAR/MONTH (201301) and is a 1:N from the attribute view to the fact table.

Now that we have the date dimension hooked up to a fact table, we can now run some queries that help illustrate how these flags can be used. Now notice that we don’t have to pass in any filter criteria for selecting the times slices, that logic is already done for us.

All of the data as shown through data preview, notice how the three flags tell us the story.


This query will show us the first three months that have already passed in the year, against the same months in all years contained in the fact.

SELECT “YEAR”, SUM(“SALES”)

FROM “_SYS_BIC”.“sandbox.justin.Date_Dim/AN_TEST_FACT”

WHERE “YTD” = ‘Y’


This query will show us all the sales by material for the current year to date

SELECT “MATERIAL”, SUM(“SALES”) AS “CURRENT_YEAR_SALES”

FROM “_SYS_BIC”.“sandbox.justin.Date_Dim/AN_TEST_FACT”

WHERE “YTD” = ‘Y’

AND “CY” = ‘Y’

GROUP BY “MATERIAL”


This query will show us all the sales by material for the current year to date against the previous year to date.

SELECT “MATERIAL”, SUM(“CURRENT_YEAR_SALES”), SUM(“PREVIOUS_YEAR_SALES”)

FROM (

SELECT “MATERIAL”, SUM(“SALES”) AS “CURRENT_YEAR_SALES”, 0 AS “PREVIOUS_YEAR_SALES”

FROM “_SYS_BIC”.“sandbox.justin.Date_Dim/AN_TEST_FACT”

WHERE “YTD” = ‘Y’

AND “CY” = ‘Y’

GROUP BY “MATERIAL”

UNION

SELECT “MATERIAL”, 0 AS “CURRENT_YEAR_SALES”, SUM(“SALES”) AS “PREVIOUS_YEAR_SALES”

FROM “_SYS_BIC”.“sandbox.justin.Date_Dim/AN_TEST_FACT”

WHERE “YTD” = ‘Y’

AND “PY” = ‘Y’

GROUP BY “MATERIAL”)

GROUP BY “MATERIAL”



Model the time slices as restricted measures in the analytic view. Now that we have these flags available, we can use them to define restricted measures.



Data Preview on Analytic View


A graphical Calculation view that consumes the analytic view and performs a UNION.

Projections on Analytic views, including the filters on time slice attributes.








Now that we have the CY/PY measures, we can perform variance and percentage calculations as calculated columns.

Data Preview

Performance Considerations with above approach, Analytic View vs. Calculation View

I created a similar scenario as above, but on a fact table with much larger data. I saw approximately 10 million records per a given month. The goal was to measure the performance different between implementing the time slice in an Analytic View restricted measure vs. the Calculation View Unions.

Conclusion

Both methods result in similar runtimes. AV performs *slightly* better (6-7% faster) on total aggregation (no additional columns), and CV definitely performs better (from 15 – 30% faster) with the more detailed dataset. Thus, there is a tradeoff between performance and maintenance/reusability by moving to a Calculation view. More than anything, it proves both methods are acceptable from a performance perspective, but generally if you need more detailed resultset, Calc view offers you better performance.

Analytic View

–Aggregated

SELECT SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), SUM(“PRIOR_YEAR_SALES”), SUM(“PRIOR_YEAR_COST”), COUNT(*)

FROM “_SYS_BIC”.“sandbox.justin.Date_Dim/AN_TEST_FACT_WDATA”


Average Run Time is 375ms

Statement ‘SELECT SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), SUM(“PRIOR_YEAR_SALES”), …’

successfully executed in 374 ms 683 µs  (server processing time: 370 ms 997 µs)

Fetched 1 row(s) in 0 ms 5 µs (server processing time: 0 ms 0 µs)

Add more detail

–Add Customer

SELECT “KUNWE”, SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), SUM(“PRIOR_YEAR_SALES”), SUM(“PRIOR_YEAR_COST”), COUNT(*)

FROM “_SYS_BIC”.“sandbox.justin.Date_Dim/AN_TEST_FACT_WDATA”

GROUP BY “KUNWE”

Average Run Time is 1.375ms

Statement ‘SELECT “KUNWE”, SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), SUM(“PRIOR_YEAR_SALES”), …’

successfully executed in 1.371 seconds  (server processing time: 1.368 seconds)

Fetched 5000 row(s) in 363 ms 267 µs (server processing time: 7 ms 35 µs)

Result limited to 5000 row(s) due to value in Result Preferences

Add even more detail

–Add Customer/Material

SELECT “KUNWE”, “MATNR”, SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), SUM(“PRIOR_YEAR_SALES”), SUM(“PRIOR_YEAR_COST”), COUNT(*)

FROM “_SYS_BIC”.“sandbox.justin.Date_Dim/AN_TEST_FACT_WDATA”

GROUP BY “KUNWE”, “MATNR”

Average Run Time is 1.820 seconds

Statement ‘SELECT “KUNWE”, “MATNR”, SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), …’

successfully executed in 1.815 seconds  (server processing time: 1.811 seconds)

Fetched 5000 row(s) in 241 ms 143 µs (server processing time: 10 ms 11 µs)

Result limited to 5000 row(s) due to value in Result Preferences

Very important to note, is that by modeling this way, we stay entirely in the OLAP engine and the Calc engine is never invoked.

Calculation View

–Aggregated

SELECT SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), SUM(“PRIOR_YEAR_SALES”), SUM(“PRIOR_YEAR_COST”), COUNT(*)

FROM “_SYS_BIC”.“sandbox.justin.Date_Dim/CV_TEST_FACT_WDATA”

Average Run Time is 410ms

Statement ‘SELECT SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), SUM(“PRIOR_YEAR_SALES”), …’

successfully executed in 400 ms 713 µs  (server processing time: 396 ms 866 µs)

Fetched 1 row(s) in 0 ms 4 µs (server processing time: 0 ms 0 µs)

–Add Customer

SELECT “KUNWE”, SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), SUM(“PRIOR_YEAR_SALES”), SUM(“PRIOR_YEAR_COST”), COUNT(*)

FROM “_SYS_BIC”.“sandbox.justin.Date_Dim/CV_TEST_FACT_WDATA”

GROUP BY “KUNWE”

Average Run Time is 950ms

Statement ‘SELECT “KUNWE”, SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), SUM(“PRIOR_YEAR_SALES”), …’

successfully executed in 975 ms  (server processing time: 1.306 seconds)

Fetched 5000 row(s) in 266 ms 153 µs (server processing time: 6 ms 127 µs)

Result limited to 5000 row(s) due to value in Result Preferences

Add even more detail


–Add Customer/Material

SELECT “KUNWE”, “MATNR”, SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), SUM(“PRIOR_YEAR_SALES”), SUM(“PRIOR_YEAR_COST”), COUNT(*)

FROM “_SYS_BIC”.“sandbox.justin.Date_Dim/CV_TEST_FACT_WDATA”

GROUP BY “KUNWE”, “MATNR”

Average Run Time is 1.52 seconds

Statement ‘SELECT “KUNWE”, “MATNR”, SUM(“CURRENT_YEAR_SALES”), SUM(“CURRENT_YEAR_COST”), …’

successfully executed in 1.518 seconds  (server processing time: 1.514 seconds)

Fetched 5000 row(s) in 279 ms 965 µs (server processing time: 6 ms 128 µs)

Result limited to 5000 row(s) due to value in Result Preferences

Here, we can clearly see that the Calc engine is invoked, which is expected



Appendix 1 – generate time data in HANA

In HANA, we can natively generate the Gregorian time dimensions we need for the above type analysis. You can either modify the table directly in the _SYS_BI schema OR copy it out to an application schema with data and modify.

Generate Time Data – Modeler View, “Data” pane

Choose grain

Schema _SYS_BI now has the data in the relevant table

You can now copy the data to another schema and alter to add the attributes you wish

CREATE COLUMN TABLE “SCHEMA”.“TABLE” LIKE “_SYS_BI”.“M_TIME_DIMENSION_MONTH” WITH DATA;

ALTER TABLE….

To report this post you need to login first.

12 Comments

You must be Logged on to comment or reply to a post.

  1. Raj Kumar Salla

    Excellent blog Justin.

    After reading your instructions, now everyone can cook their popcorn with just single button “start” with time choosen 🙂

    What i like most is, no input parameter required and thus not allowing the end user to input any and getting all the slices of time with restricted measures. Very clever thought and executed superb.

    Regards

    Raj

    (0) 
  2. Jody Hesch

    Great article, Justin – thanks for sharing! Lots of work obviously went into the article, it’s well thought out, and immediately applicable to anyone doing period-based reporting in HANA.

    The “production” approach that I follow for period-based reporting is similar to what you provided, but a bit more closely aligned to Abani’s article: Modelling efficient Time Period based calculations (YTD, PTD, WTD etc.) in HANA

    It’s very similar, but rather than capturing boolean values for period types – it instead stores period types and years (N, N-1, etc) as rows – which has two advantages:

    1) If a new period type (i.e. QTD) is requested, only new records are required – no change to table structure.

    2) No need to maintain the table on a daily basis. Up-to-date calculations are achieved with a projection node filter on now() (or can dynamically achieve point-in-time period-based reporting with user input parameter, so long as the data supports it).

    Cheers,

    Jody

    (0) 
    1. Justin Molenaur Post author

      Jody, thanks for brining this to my attention. In some way I may have violated the cardinal SCN rule of “always search before you post”  I didn’t see this blog before.

      That being said, it’s a good discussion point since these are two distinct approaches. The blog you post uses a row based/”account” approach and what I provided uses more of a column/”key figure” approach.

      The advantages you point out are very valid, especially removing the need for structure modification and removing dependency on running a job to keep the date dimension “moving forward” by precalculating everything. Not to mention the ability to “traverse” the dimension using time, ie “what did my YTD vs. PYTD sales look like 5 weeks ago?”.

      However, from what I understand about the setup, you cannot in fact get away from requiring an input from the user if you want to stay only in the Analytic View. Abani mentions later on in the post “You must pass a value for DATE_SQL in your query, and in his example he is querying the view directly.

      As you point out, in a Calc view, we have the opportunity to use functions in the filter on a projection node where we can use now(). In an Analytic view the only opportunity we have to pass a filter is using a variable attached to DATE_SQL or by constructing the SQL manually (not an option for BO clients). Of course we can set the default variable to an expression, but still requires an input from user.

      I can see each of these approaches would be valid in a given use case. Always nice to have options, thanks Abani Pattanayak for the previous article!

      Happy HANA,

      Justin

      (0) 
      1. Jody Hesch

        Justin Molenaur wrote:

        Jody, thanks for brining this to my attention. In some way I may have violated the cardinal SCN rule of “always search before you post”  I didn’t see this blog before.

        Well you referenced 4 other posts already so I think you did your due diligence. 🙂   Sometimes it’s a bit tricky to catch everything.

        That being said, it’s a good discussion point since these are two distinct approaches. The blog you post uses a row based/”account” approach and what I provided uses more of a column/”key figure” approach.

        The advantages you point out are very valid, especially removing the need for structure modification and removing dependency on running a job to keep the date dimension “moving forward” by precalculating everything. Not to mention the ability to “traverse” the dimension using time, ie “what did my YTD vs. PYTD sales look like 5 weeks ago?”.

        However, from what I understand about the setup, you cannot in fact get away from requiring an input from the user if you want to stay only in the Analytic View. Abani mentions later on in the post “You must pass a value for DATE_SQL in your query, and in his example he is querying the view directly.

        As you point out, in a Calc view, we have the opportunity to use functions in the filter on a projection node where we can use now(). In an Analytic view the only opportunity we have to pass a filter is using a variable attached to DATE_SQL or by constructing the SQL manually (not an option for BO clients). Of course we can set the default variable to an expression, but still requires an input from user.

        True. In each of my projects though, there’s been a desire for multiple period types on one report, so combining UCV with this approach worked out quite well, with project node filters on now(). I did, however, consider the same approach as you – i.e. a scheduled procedure to update just to today’s date instead of point-in-time capability. Decided though, to leave the option available for point-in-time queries. So, as usual, just depends on user requirements.

        I can see each of these approaches would be valid in a given use case. Always nice to have options, thanks Abani Pattanayak for the previous article!

        Happy HANA,

        Justin

        (0) 
        1. Justin Molenaur Post author

          “In each of my projects though, there’s been a desire for multiple period types on one report, so combining UCV with this approach worked out quite well”


          I guess where I was coming from, is that in the approach outlined in this blog (modeling as columns), you can get the same “multiple slice” looks (via restricted measures) directly in an Analytic View without using input.


          Regards,

          Justin

          (0) 
          1. Jody Hesch

            Right. Both approaches are the same in that regard. They both would support either restricted measures or UCV, and both can be updated daily for “to-date” reporting, or can have an additional date column and filter to support “point-in-time” reporting. It just boils down to user requirements.

            (0) 
  3. Gerhard Brückl

    Hi Justin,

    as you seem to have worked quite a lot with calculations like YTD I was wondering if you could help me on my requirement. So I want to have a report where I put my Months on rows and have a measure which shows me the YTD-amount of that given month. e.g.

    Sales Sales YTD
    Jan 100 100
    Feb 50 150
    Mar 90 240
    Apr 110 350

    so far I could not find any solution in SAP HANA except for duplicating the data for each month. Your thoughts on this would be really valuable to me!

    In your example you also created a calculation for SALES_VAR_PERCENT. How is this calculation defined? If it is a calculated column it is calculated on row-/leaf-level which does not make a lot of sense to me as a calculation like this should also work on aggregated levels. How could I create a similar calculation that operates on aggregated levels also?

    thanks in advance for your feedback!

    -gerhard

    (0) 

Leave a Reply