I was trying to understand the Generated Key Column option present in table comparison and after experimenting I found the below observation.
TABLE COMPARISON: – Generated Key Column, column present in Target not in source.
When your target column has duplicate records present in it then this option is used to detect any update or insert for the incoming records. By duplicate record I mean duplicate values for column(s) defined in Input Primary Key columns section at the target side in table comparison transform.
The value selected in this column should identify every record present in target side uniquely. The transform selects the maximum value of generated key column, for column(s) defined in Input Primary key columns, and compares it with the incoming records.
Let’s see this with the following example.
Consider the below source and target when job was executed initially.
Now you might be thinking how S_NO has the records. This is done by Key generation transform. In a gist, this Key Generation transform looks for the maximum value of generated key column and starts increasing by the number you specify.
I’ll post about the functioning of Key Generation in my coming blog posts.
Now I add two new records for EMP_ID 1010 and 1011 as shown,
New Records: –
1011 RAJ EY 976000 654 02-FEB-18
1010 ADVISORY 1256400 256 27-SEP-18
New source before execution: –
After execution: –
Records present in final target: –
Now how it processed the records and of all the updates which updates got reflected in target?
When job is executed Table Comparison transform brings all the records to its before image and then it start comparing with the incoming source records. Now since you’ve selected the Generated Key column so the maximum value, for the duplicate records, of Generated Key column is picked and then its compared with incoming records.
For EMP_ID 1010 EMPNAME EY has the maximum value of generated key column which is 2 (as S_NO is 2). Similarly for EMP_ID 1010 EMPNAME RAJ KPMG has the generated key column value which is 4. (Because both EMP_ID 1010 and 1011 has two records and the records which have maximum generated key value is picked)
Based on S_NO, key which uniquely identified every row present, incoming source records are compared. And of all the updates generated it’s always the first update which is reflected in target.
Let me clarify by adding more records.
I truncate the existing target table and insert complete new records into it.
So before execution,
New Source: –
New Target: –
Following records are added: –
1012 RAJ WIN 18526 654 27-AUG-18
1012 RAJAN EY 65000 652 03-MAR-18
New source: –
Target before execution: –
After execution: –
Now see incoming record with EMP_ID 1011 is compared with EMPNAME RAJ KPMG having S_NO 7, which is maximum for EMP_ID 1011 and an update is send by Table comparison.
Next incoming record with EMP_ID 1012 is compared with EMPNAME RAJ PWC having S_NO 8, which is maximum for EMP_ID 1012.
Again next incoming record with EMP_ID 1010 is compared with EMPNAME having EY having S_NO 5, which is maximum for EMP_ID 1010.
In this way all the incoming records are compared with their maximum generated keys values and accordingly update is sent to the target.
So my final Target table looks like: –
The other duplicate records are not at all affected except the records with maximum generated key value (here it’s S_NO).
So always remember, of all the updates generated for a particular primary key, column(s) defined in Input Primary Keys, it’s always the first update which is sent to the target for that particular primary key.
Hope it clarifies!
Kindly let me know if I’ve missed anything or something needs to be added or modified.