Slowly Changing Dimensions
SCD- Slowly Changing Dimensions
SCDs are dimensions that have data that changes over time. The following methods of handling SCDs are available:
- Type 1 : No history preservation
v Natural consequence of normalization
- Type 2: Unlimited history preservation and new rows
- v New rows generated for significant changes
- v Requires use of a unique key
- v New fields are generated to store history data
- v Requires an Effective_Date field .
- Type 3: Limited history preservation
v Two states of data are preserved: current and old
Slowly Changing Dimension Type 1(SCD Type1)
For SCD Type 1 change, you find and update the appropriate attributes on a specific dimensional record. The new information simply overwrites the original information. In other words, no history is kept.
Example
|
|
|
|
|
|
After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table:
Customer Key |
Name |
State |
1001 |
Christina |
California |
Advantages:
This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information
Disadvantages:
All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before.
Slowly Changing Dimension Type 2(SCD Type2)
With a Type 2 change, we don’t make structural changes in the table. Instead we add a record. A new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.
In our example, recall we originally have the following table:
Customer Key |
Name |
State |
1001 |
Christina |
Illinois |
After Christina moved from Illinois to California, we add the new information as a new row into the table.
Customer Key |
Name |
State |
1001 |
Christina |
Illinois |
1005 |
Christina |
California |
Advantages:
This allows us to accurately keep all historical information.
Disadvantages:
This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
This necessarily complicates the ETL process.
Slowly Changing Dimension Type 3(SCD Type3)
With a Type 3 change, we change the dimension structure so that it renames the existing attribute and add two attributes, one to record the new value and one to record the date of change.
In our example, recall we originally have the following table:
Customer Key |
Name |
State |
1001 |
Christina |
Illinois |
- After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003):
|
|
|
|
|
|
|
|
|
|
Advantages:
This does not increase the size of the table, since new information is updated.
This allows us to keep some part of history.
Disadvantages:
Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost.
Thanks for the article Richa!
I would like to appreciate the way of explaination of topic with example.
Keep posting.
Thanks,
Ankit.
Good, thank you.
Waiting part2 where will be information how to supporting this types of SCD in DS.
Thanks Richa, it was ultimate for revision purposes .
Regards,
Manprabodh Singh
Hi Richa,
Thank you very much for this info.
Regards,
Prasad.
Thanks Richa,
It was a good explanation for topic like SCD and eagerly waiting for part 2.Nice use of examples to explain the concept.
Regards ,
Vivek
Very clear and the example provided is really good.
so finally which one of the above types (1, &3) will be use mostly?
Clear, Simple & easy to understand.