Technical Articles
Slowly Changing Dimensions in SAP Data Warehouse & Analytics Cloud done right
Tracking the changes in datasets is one of the core functionalities of any data warehouse. The most common form of historization is called slowly changing dimension type 2 (SCD2). Instead of simply overwriting changes in a dataset (which is a type 1 historization), validity dates are used to identify the time range in which a data set is valid. This ensures, that the correct picture is drawn when looking at the data for specific dates in the past.
Take the organizational assignment of employees as an example. If employees change from one team to another, they will usually be assigned to a different cost center or department. However, if not tracked, this change will not only affect the new data that relates to this employee, but also historical data, e.g., the recorded working hours. These hours would simply show up for the new department instead of the previous one.
The same applies for the assignment of sales regions to managers or products to product groups. Historically correct data is relevant for almost every business domain and therefore an important part of most data warehouse projects.
Until now, developers had a hard time trying to implement a SCD2 historization in SAP Data Warehouse Cloud (SAP DWC). There was no built-in functionality to assign validity dates in a dimension to filter the right datasets for a selected date.
With the release 2022.25, SAP has enabled time-dependency for dimension and text entities which allows the implementation of a SCD2 historization. In the following I will describe the steps necessary to take the full advantage of this new feature.
In this example, I will use a simple fact table with recorded working time and a dimension table with information about employees and their department to explain the new feature.
Facts have the following structure:
Employee ID | Working Hours | Date |
1 | 8 | 01.12.2022 |
2 | 7 | 01.12.2022 |
3 | 9 | 01.12.2022 |
4 | 9 | 01.12.2022 |
5 | 10 | 01.12.2022 |
6 | 5 | 01.12.2022 |
7 | 8 | 01.12.2022 |
8 | 8 | 01.12.2022 |
9 | 7 | 01.12.2022 |
10 | 6 | 01.12.2022 |
1 | 8 | 02.12.2022 |
2 | 7 | 02.12.2022 |
3 | 9 | 02.12.2022 |
… | … | … |
The employee dimension is structured as follows:
Employee ID | Name | Department | Valid From | Valid To |
1 | Joe | Finance | 01.12.2022 | 31.12.9999 |
2 | Mary | Marketing | 01.12.2022 | 05.12.2022 |
2 | Mary | Operations | 06.12.2022 | 31.12.9999 |
3 | Martin | Operations | 01.12.2022 | 31.12.9999 |
… | … | … | … | … |
Notice that Mary is changing the department on December 6th. If we do not consider this change, all working hours before this date will show up for the operations department, even though she worked for the marketing department during this period.
Use this data set as an example or use your own data to implement a historization with these eight easy steps:
1 . Prepare Dimensional Data
After uploading your data, set the semantic usage of the dimensional data to “Dimension”:
Set the semantic type of the columns with the validity dates to “Business Date – From” and “Business Date – To”. This will enable the input parameter to select the right data sets.
Set the semantic type of the “Name” column to “Text” and the Label Column of “Employee ID” to “Name”. Set the Employee ID and the valid from column as key.
2. Deploy the Dimension
It is now ready to be consumed by the analytical data set.
3. Create an Input Parameter
In a graphical view, create an input parameter that is used in the story to select a date:
Optionally, set a default value for the date parameter.
4. Set a Filter
Set a filter that uses the input parameter as a condition:
5. Edit Custom CSN Annotations
In the toolbar, click on “Edit Custom CSN Annotations”
Add the following code below the input parameters name (don’t forget to put a comma after true):
"@Semantics.businessDate.at": true
Safe your changes.
6. Create an Association
Create an association to the employee dimension on the Employee ID:
Optionally, create an association to the system created time dimension to use a drill down on the date in SAP Analytics Cloud.
7. Finalize your Fact Data
Set the semantic usage of the created view to “Analytical Dataset” and define the column “Working Hours” as measure with the aggregation type “SUM”. Expose this view for consumption in SAC
8. Use the Analytical Dataset in SAC
Deploy the created view and switch to your SAP Analytics Cloud tenant. Create a new story based on the view. Once the data set is selected, you will see the following pop-up to select a date for the created input parameter.
Good Job! You just created an SCD2 historization in SAP DWC and now have your view ready for analysis in SAC. Let us have a closer look on what we created.
Set the desired date for the input parameter and add Working Hours, Employee ID and Department to a chart.
As you can see, for December 1st, Mary shows up at the marketing department:
When switching the input parameter to December 8th, Mary shows up at the operations department which is now the valid department.
Notice that the input parameter not only affects the dimensional data of employees, but also restricts the displayed facts to one day. If you wish to display all data from a defined key date on, you should implement your input parameter this way:
This way, Mary shows up with her assigned department on December 1st .
Sometimes, you want to look at both, the historical correct data, and the historic values with updated dimensions. For example to compare the effects of structural changes in the organization over time. In this case, the changing attribute should be included in both, the dimension with validity dates, and the fact table with the current valid version.
Wrap up
In this blog post you became aware of the relevance of historical data for different kinds of business domains. You learned how to implement an SCD2 historization in SAP DWC and how to use it in SAC. Furthermore, you learned how the data can be analyzed from a business perspective and that in some cases, small adjustments have to be done in order to see the desired results.
If you liked this article or have something to add, you can use the comment section to share your thoughts and feedback and follow my profile (my profile) to receive similar content in the future.
If you want to stay updated on relevant news about SAP DWC, make sure to follow the topic page and check out further blog articles about SAP DWC here. If you have any question left, you can also use the comment function in this article or ask your question here.
Great new feature!
Do you know if there is any plan to implement the option for it to work based on the fact table dates instead of an input parameter?
Hi Marcelo,
thanks for your question. Right now, if you don't use an input parameter, SAC will display the facts with the current valid dimension for the whole time period. (As if you don't have any historization in your dimension at all.)
Unfortunately I don't know if it is planned to automate the selection of the dimension data based on fact table dates. But it is definitely a feature I'm looking forward to.
Best
Sebastian
Great read my friend and great new feature!
Hi Sebastian,
nice idea. Thanks for sharing.
Best regards,
Tobias
Thanks for sharing this. Very helpful!
As an alternative, could you create a SQL View which Joins based on the employee code and where the transaction date is between (start and end dates) one of the dimension records?
This should not require the additional parameter and should have the effect of only selecting the valid dimension record and associated Facts - across all time.
Also, with SCD2, the real challenge tends to be in dating the dimension records e.g. expiring old records and creating new ones. Rarely, does the source system present the records with these validity dates. I don't think this feature helps with that?
Let me know your thoughts? Thanks again
Hi Asad,
thanks for your comment. Creating an SQL View with a conditional join could also be a solution to find the right dimensional data for every transaction. However, creating joins is not the recommended way of connecting dimensional data with facts in DWC. With an association you can keep the fact table small and improve performance compared to a join.
In fact, source systems often don't provide validity dates for dimensional data and also don't keep all the historic values. I have already planned to write an article about how to create these validity periods. So stay tuned if you want to find out more!
Hi Sibastian,
Today I tried to implement SCD the way you mentioned in your blog but couldn't move forward with association as the employee ID require text association. Please help me on this.
Hi Sanjoy,
you could check if you have set the semantic usage for the dimension view correctly. If you use the type "Dimension", you should be able to create a dimension-type association. You can find the correct settings in the screenshots of step 1 "Prepare Dimensional Data" in this blog article.
Hope I could help you.