The case against the Where clause
Note that the post uses examples from Oracle but equally applies to SQL Server, Teradata, HANA, etc with the relevant DBMS date syntax
When creating objects, it is too easy to see the blank “where” box and decide to fill it in. What happens if you do fill it in though? Say you want sales YTD. You’ll create a new variable, enter sum(sales_fact.transaction_value) in your select box and then year(sales_date.transaction_date)=year(sysdate) and sales_date.transaction_date in the where box
The problem with the approach of using the where clause is simple. The where clause does not just apply to that object but is applied to the query as a whole. The impact of this becomes clear when you want to report MTD next to YTD sales. Let’s create a new variable, Sales MTD as:
sum(sales_fact.transaction_value) in the select box and then
trunc(sales_date.transaction_date,’mm’)=trunc(sysdate,’mm’) and sales_date.transaction_date in the where box
But what happens when you use them together? You’ll find that your YTD sales are correct only in January. This is because you have both “where box” conditions active in the where clause of your SQL statement. As such, your query will only ever return current month data. If you then create an object for previous year to date sales then you’ll find that you get no data. The object itself is similar to current year to date. So, create Sales PYTD as:
sum(sales_fact.transaction_value) in your select box and then
year(sales_date.transaction_date)=year(sysdate)-1 and sales_date.transaction_date <=add_months(sysdate,-12) goes in the where box
Again, works fine on its own but what happens when you want to compare YTD to PYTD?
Yep, you guessed, no rows because an individual transaction cannot be for the current year and the previous year at the same time.
So, how do we get round this. There are two main techniques; case expressions or a more comprehensive calendar table.
The case expression solution is straightforward and simply involves moving the where clause into the select clause through use of case expression logic. Sales YTD would become:
sum(CASE WHEN year(sales_date.transaction_date)=year(sysdate) and sales_date.transaction_date THEN sales_fact.transaction_value END)
So, edit your object, retry your query without the other sales measures and you’ll see that the where clauses that were so restrictive before have now gone. You can reapply the same logic to the other two objects and happily use all three in the same query.
The other technique involves significant changes to your calendar table and an extra ETL script to update your calendar table on a nightly basis. The implementation of the table and the script will be a whole separate post at some point but for now, let’s cover the principles. The technique involves adding boolean (1/0) flags to your calendar table to denote YTD, MTD, PYTD and so on. The measures are then simple. For example, Sales YTD would become:
sum(sales_fact.transaction_value * sales_date.ytd_flag)
This would then mean that each transaction value in your sales fact table would be multiplied by 1 or 0 depending upon it being a qualifying row (or not). A very powerful but simply technique but with additional (but not significant) overhead on the daily ETL process. Obviously if you are restricted in what you can do at the data warehouse level then the case expression route is far more achievable.
So, in summary, avoid where clauses unless you know that they MUST be applied every time a particular object is used because they can cancel each other out.