Table Comparison : Row by Row select, processing of records
I was trying to understand the Table Comparison options and started my research with Row by Row select option and after experimenting I found these observations.
TABLE COMPARISON: –
Table comparison is used to update the target table when source records have been updated/changed. It is used in implementing slowly changing dimension (type 2 and type 3). It will always receive NORMAL input and will generate three opcodes: –
INSERT (I) – When new records have been inserted in source table but not present in target table.
UPDATE (U) – When source records have been updated and those updates in source need to be reflected in target.
DELETE (D) – When some records have been deleted from source and those records need to be deleted from target, if required.
It is always connected with query transform as this is the only transform that generates NORMAL opcode.
Based on the opcodes generated by table comparison the target is changed.
How it does the processing: –
In Table comparison option we have a Table_Name section wherein we select the table which is needed to be compared which is called comparison table, it is always the target table. Table comparison maintains two images of target table, Before Image and After Image. The final data present in After Image is reflected in Target. Before Image sends its changes to After Image. In short both Before Image and After Image are buffers holding the data.
What happens when “Row by Row select” is selected?
In row by row select method an SQL statement is executed based on columns defined in Input primary key column section. If we do not specify any column in compare columns section then query is: –
Select * from table_name where C1 = xxxx and C2 = yyyy and so on (based on columns present in Input primary key columns section).
If we specify any column in compare columns section then query is: –
Select C1,C2,C3,… (list of columns present in Compare column section) from table_name where C1 = xxxx and C2 = yyyy and so on (based on columns present in Input primary key columns section).
This result then brings the records in Table Comparisons Before Image buffer and compares with the source and if any change found it sends to After Image and result of After Image is send to Target.
In comparison method select “Row by Row select” option. Now it does the comparison.
Let’s suppose that initially no records are loaded in target table i.e., target table is blank which means Before Image and After Image sections are also blank. Now during first load i.e., when first time job gets executed, due to empty target table, all records are inserted in target table with opcode (I) provided there is no duplicate data present in column defined in Input Primary Key columns
Before Execution: –
Table Comparison: –
After Execution: –
Opcodes I as all records are new based on EMPID (column name which we have defined in Input Primary Key columns.)
Now I truncate the target table.
After executing the job: –
Target table: –
And now in the same source I INSERT few duplicates with EMPID 1010 and 1011.
Source becomes: –
And records in target table before execution: –
After executing the job,
Records in final target: –
How did it processed?
When the job runs, records from target table are placed in before image of transform.
Initially before execution for EMP_ID 1010 target table had EMPNAME as RAJ EY…, for EMP_ID 1011 EMPNAME was RAJ KPMG…. and this EMP_ID is then compared with incoming source records.
Transform finds that incoming source has two duplicates for EMP_ID 1010 and one uplicate for EMP_ID 1011.
It then compares these incoming records with comparison table and generates ‘U’, UPDATE, opcode.
Now of all the UPDATES generated which one will be reflected in Target?
It is always the first UPDATE, for key column define in Input Contain Primary keys, will be sent to target.
For EMP_ID 1010 first update was ENAME RAJ, hence it was reflected in target.
Table Comparison fires an Update statement for the changed row.
Update table_name set Column = xxxxx where column_name = yyyyy.
Now I’ve changed the source data: –
Row with EMPID 1010 (1010 RAJ … and 1010 RAJ EY) has been deleted from source.
Before Execution: –
After Execution: –
Final Target: –
Now for EMP_ID 1010 ENAME RAJ was present before execution and for EMP_ID 1011 RAJ MCKINSEY was present. After execution the transform finds the change for EMP_ID 1010 and 1011 and sends the UPDATE which are then reflected in target.
Rows from source are not deleted from Target because we have not checked delete options in Table Comparison options section, hence TC does not checks for delete.
Always where clause will contain the list/combination of columns present in input primary key columns section.
Hope it helps!
Please correct me if I’ve missed something.