Data_Transfer transformation in Data Services
When there is a need of pushing operations to the database server rather than loading Job server, we can use Data_Transfer as a Transformation step in Data services. The idea is to push down resource-consuming operations to database server such as Joins, Group By and Sorts using a temporary staging Area.
It writes the data from a source or the output from another transform into a transfer object and subsequently reads data from the transfer object. The transfer type can be a relational database table, persistent cache table, or file.
Use the Data_Transfer transform to push down operations to the database server when the transfer type is a database table.
The data input is from a source or the output data set from another transform with rows flagged with the NORMAL operation code. This data is referred to as the input data set.
The input data set must not contain hierarchical (nested) data.
A data set with the same schema and the same operation code as the input data set. If a subsequent ORDER BY or GROUP BY is pushed down to the database, the output rows are in the ORDER BY (or GROUP BY) order.
Let us develop a sample scenario discussing the usage of Data_Transfer.
Scenario: Customer Order information is stored in a table keyed in after cust_id, but customer name is not available. Scenario demands to look for customer name in another table, and then present the sum of order price based on Customer.
Key Feature: We will break this into steps.
(1) Use external look up to retrieve customer name based on input as cust_id from current table.
(2) Push down the operation of group by to database server to retrieve sum.
After we build the solution, Data flow will look as below.
(1) (1) Create a Job and Data Flow with proper names.
(2) (2) Drag and Drop the source ‘orders’ from Datastore.
(1) (3) In the next step, define new Query (Query_lookup).
Copy all the source fields except cust_id to Schema Out. Add new field Customer_name.
Define External Look up by selecting Customer_name, in the mapping editor select Functions >> Lookup Functions >> lookup_ext and define the look up table, condition, output as per the screenshot below.
(1) (4) Now pull Data_Transfer transformation from Object Library >> Transform >> Data Integrator >> Data_Transfer and drop it in the workspace.
Select Transfer type to be ‘File’, specify Root directory and File name.
Data_Transfer is going to create a temporary staging area at run time, with values as defined by Data Flow. Later these entries can be retrieved to perform succeeding operations until they are mapped to final target.
The life time of ‘Order_staging_data_transfer.txt’ is execution time.
(1) (5) To perform succeeding operations, define another Query ‘Query_groupby’, for determining the sum of order price per each customer.
Define in the mapping editor
a. OrderPrice as Sum(Data_Transfer.OrderPrice)
b. Custname as Customer_name and GROUP BY as ‘Data_Transfer.Customer_name’.
(1) (6) Last step is to map it to target; in this case it is File format ‘OrdersTransfer.txt’.
Observation: During run time, you observe a file is created temporarily, and vanish after execution.
The contents of the file contains encrypted data.
Finally the target file is generated as per the screen shot below.