Exception aggregation modeling with Graphical Calc view
Please note that the following model has been implemented in SPS06 (revision 60 and onwards). I am not sure about the feasibility of this approach on the earlier revisions. The observations on the below model are based on my personal experience and opinion.
The business scenario is quite common in the BI reporting. Consider a KPI, required to be calculated at a low level of granularity, but the reporting is required at the higher level of granularity. With the default aggregation behavior of the calc view, the constituents of the Calculated KPI might also be aggregated at the reporting granularity level, resulting in incorrect value for the Calculated KPI.
To elaborate more on this, consider the following data model for the Sales Transaction. The transaction data is captured for Product Sold on a given date in a store. The table structure can be defined as follows:
( STORE_ID nvarchar(10), — Store Id
PROD_ID nvarchar(10), — Product Id
PROD_CAT nvarchar(10), — Product Category
PROD_PRICE decimal(10,2), — Product Price
SOLD_QTY integer, — Product Quantity Sold in a Sales transaction
TRANS_DATE nvarchar(8), — Sales Transaction Date
STORE_REG nvarchar(10) — Region in which the Store is located
The business rule could be:
The product price may vary each day, hence the Net Sales need to be calculated for the Product Price on the transaction date and Quantity Sold on the day.
*** In ideal scenario, the Net Sales should be calculated at the time of data loading and should be persisted in the data model. This can have positive impact on the performance and is also a recommended modeling option. The following modeling option explained below should be implemented, when the calculation persistence is not feasible due to more complex requirements or different sources / data stores for Product price and Sales Transaction.
The sample Sales transaction data is as follows:
In the above example, consider the Net Sales in the Sales transaction data with calculation as Product Price * Quantity Sold. The Net sales need to be calculated at the transaction level granularity.
The standard Calc view model, based on a standard Analytic view using the transaction table is as follows:
Please note that the Multidimensional Reporting property is set to True which adds default aggregation node to the Calc view. The default behavior of all the base KFs is set to SUM. The Calculation of the Net Sales is defined as below hence executed as “Calculate After aggregation” resulting in the following output.
The output of the query is as follows:
The Product Price is shown for the explanation purpose. The Product price at Product category level does not really make any business sense.
The expected correct output for the above query is as follows:
To achieve the above output, we need to make the following changes to the model.
- Create the model with the Multidimensional property set to False, hence adding the default Projection node to the model.
- Add an aggregation node on the Analytic view to define the aggregation behavior.
- Set the Keep Flag property for all the attributes which defines the Exception aggregation criteria. In this case, the aggregation is expected to happen at Store, Product and Transaction date irrespective of the reporting granularity. The Keep flag will always add these columns in the group by clause even if they are not selected in the query. Please note that the added columns in aggregation will impact the performance as each column join is added to the resultset.
- Define the Calculated KF for Net Sales in the Aggregation node. The calculation remains same as shown in the earlier screenshot.
The query executed on this model will result in the required output.
Well, now that works as expected.. 🙂
Please note that this implementation has performance impacts. If we look into the visual execution plans for both the queries, then we can see that the second query includes all the columns in the SELECT statement along with all the columns with Keep flag property set to true.
Execution for the first query on ZGCV_PROD_SALES looks something like:
The execution plan shows only those columns required in the SELECT statement.
Execution for the second query on ZGVC_PROD_SALES_EXCP_AGGR (with Exception aggregation) looks like:
As we can see from the execution plan above, the performance gets impacted due to additional column joins resulting in higher data volume, 20 rows, being passed from one engine to another, as compared to 3 rows in the earlier execution. The generated resultset depends upon the granularity specified with the Keep flag setting.
As mentioned earlier, the better option is to have such values persisted in the database. In case, due to complex business or technical requirements such persistence is not feasible, then the above option should work fine. But at the cost of some performance impact.
All the comments, suggestions, discussions are most welcome. 🙂