Skip to Content

Here I am trying to explain the methods to implement SCD types in BO Data Service. The different types of slowly changing dimension types are given below.

1.       Type 0

2.       Type 1

3.       Type 2

4.       Type 3

5.       Type  4

6.       Type 6/Hybrid.

In this document I will explain about first five types of SCD types with examples.

Source Data

Name

IDate

designation

John

2002.12.01

A

Jay

2012.12.01

A

Jasil

2012.12.01

A

SCD TYPE 0

The SCD Type 0 method is passive. Value remains the same as it were at the time the dimension record was first entered. Type 0 also applies to most date dimension attributes.

SCD TYPE 1

This method does not track any history data .This methodology overwrite old data with new data without keeping the history. This method mainly used for misspelled names.

Let consider below given data is our target data after the first run.

ID

Name

IDate

designation

1

John

2002.12.01

A

2

Jay

2012.12.01

A

3

Jasil

2012.12.01

A

During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be.

ID

Name

IDate

designation

1

John

2003.12.01

B

2

Jay

2012.12.01

A

3

Jasil

2012.12.01

A

Here the no history is preserved the designation and Idate are updated with new value.

In BODS it can be implemented by using following transformation.

SCD1.JPG

Source:- Test_Effective_date is our source given above (Section – Source Data)

QR_MAP :- Map the source data to query transform without applying any transformation.

TBL_CPM :- Table comparison used to compare source data and the target table data .

MP_OPR :- This will be used to insert new data and update old data.

KEY_GEN :- This transform used to generate a surrogated key .

SCD TYPE 2

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key. Here 2 new column columns are inserted called start_date and end_date.

Let consider below given data is our target data after the first run.

ID

Name

IDate

designation

Start_Date

End_Date

1

John

2002.12.01

A

2002.12.01

9000.12.31

2

Jay

2002.12.01

A

2002.12.01

9000.12.31

3

Jasil

2002.12.01

A

2002.12.01

9000.12.31

During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be.

ID

Name

IDate

Designation

Start_date

End_date

1

John

2002.12.01

A

2002.12.01

2003.12.01

2

Jay

2002.12.01

A

2002.12.01

9000.12.31

3

Jasil

2002.12.01

A

2002.12.01

9000.12.31

4

John

2003.12.01

B

2003.12.01

9000.12.31

Here a new row is inserted and the end_date of the first row will be the start date of the newly  updated value. (note:-‘ 9000.12.31’ is the default value)

In BODS it can be implemented by using following transformation.

SCD2.JPG

SOURCE : This is our data as given above(Section – Source Data)

QR_MAP : Map the source data to query transform without applying any transformation.

TBL_CPM :- Table comparison used to compare source data and the target table data .

HIS_PRES:- This transform is used to store the history data and if any updates on source data then new row will be inserted.

KEY_GEN :- This transform used to generate a surrogated key .

SCD TYPE 3

This method tracks changes using separate columns and preserves limited history. The Type II preserves unlimited history as it’s limited to the number of columns designated for storing historical data. The original table structure of Type II differs from Type II by Type III adds additional columns. In the following example, an additional column has been added to the table to record the original target- only the previous history is stored.

Let consider below given data is our target data after the first run.

ID

Name

IDate

Curr_Designation

Effective_date

Pre_Designation

1

John

2002.12.01

A

2002.12.01

A

2

Jay

2002.12.01

A

2002.12.01

A

3

Jasil

2002.12.01

A

2002.12.01

A

During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be.

In the target output an extra column will be added to keep the previous value of that particular column.

ID

Name

IDate

Curr_Designation

Effective_date

Pre_designaion

1

John

2003.12.01

B

2003.12.01

A

2

Jay

2002.12.01

A

2002.12.01

A

3

Jasil

2002.12.01

A

2002.12.01

A

(Note:- on the first run the Pre_designaion and Current Designation are same)

In BODS it can be implemented by using following transformation.

SCD3.JPG

SOURCE : This is our data as given above(Section – Source Data).

QR_JOIN : This query transform is used to join the Source with target.

                       This will be a left outer join and the Source will be the outer source and the target is the inner source. This join will based on the name column on the both side.(Source.name = SCD3.name)

QR_INSERT :- This query transform used to filter the data which is new that means SCD3.name is null .

QR_UPDATE : This query transform used to filter the data which is already existed in target table but the designation is updated that means SCD3.name is not null and the designation from the source and the previous designation from the SCD3 table are not same.

MP_UPDATE : This transform used to update the target table by setting the map operation as  ‘normal to update’

KEY_GEN :- Key generation is used to generate an surrogated key for newly inserted row.

During the first run the source file have the data as given above and the target table will not have any data so that all set of data will moved to QR_INSERT flow and loaded into target table.

SCD TYPE 4

SCD Type 4 design technique is used when SCD Type 2 dimension grows rapidly due to the frequently changing dimension attributes. In SCD Type 4, frequently changing attributes will be removed from the main Table and added in to a History Table.

The Type 4 method is usually referred to as using “history tables”, where one table keeps the current data and an additional table is used to keep a record of some or all changes.

Let consider below given data is our target data after the first run.

ID

Name

IDate

designation

1

John

2002.12.01

A

2

Jay

2002.12.01

A

3

Jasil

2002.12.01

A

And the History table will not have any data.

During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be. There will be two table one will keep the current data and other will be the changed data (history).

SCD4_CUR

ID

Name

IDate

designation

1

John

2003.12.01

B

2

Jay

2002.12.01

A

3

Jasil

2002.12.01

A

SCD4_HIST

The history table will have an addition column compared to  the source that will be created date.

ID

Name

IDate

designation

Created_date

1

John

2002.12.01

A

  1. 2003.12.01

(Note: created date will be is the date when the designation is changed)

In BODS it can be implemented by using following transformation.

SCD4.JPG

SOURCE : This is our data as given above(Section – Source Data).

QR_JOIN : This query transform is used to join the Source with target.

                       This will be a left outer join and the Source will be the outer source and the target is the inner source. This join will based on the name column on the both side.(Source.name = SCD4_Curr.name)

QR_INSERT :- This query transform used to filter the data which is new that means SCD3.name is null .

QR_UPDATE : This query transform used to filter the data which is already existed in target table but the designation is updated that means SCD4_Curr.name is not null and the designation from the source and the previous designation from the SCD4_curr table are not same.

QR_CURR: This Query transform is used to updated the target table with new designation .

QR_HIST : This Query transform is used to updated the target table with Previous designation .

MP_UPDATE : This transform used to update the target table by setting the map operation as  ‘normal to update’

KEY_GEN :- Key generation is used to generate an surrogated key for newly inserted row.

During the first run the source file have the data as given above and the target table will not have any data so that all set of data will moved to QR_INSERT flow and loaded into target table.

There are different methods to implement SCD types in BODS and above explained is one method to implement the same.

Regards

Asgar

To report this post you need to login first.

16 Comments

You must be Logged on to comment or reply to a post.

  1. Sagar Girme

    Hi Asgar,

    Good information. Just want to add simple but very important point here. Whenever updating target table, we need to make sure that we are using input keys. That is in target table properties, Options tab, in Update Control Use Input Keys as ‘YES’.

    Thanks

    Sagar

    (0) 
  2. Varsha Sonik

    Hi Asgar,

    I am Trying to go from

    Id

    LS001

    LS003

    NT006

    LS007

    ES009

    LS004

    TO OUTPUT


    Col          ID

    T1          LS001,ES009,LS003

    T2          NT006

    T3          LS007,LS004

    pls help me out or give me  hint or idea so i can use that.

    Regards

    Varsha

    (0) 
      1. Varsha Sonik

        Hi Neha,Thanks for your response,can you explain with this example because in real time more than 50000 data and i need output as above..?

        Regards

        varsha

        (0) 

Leave a Reply