Skip to Content

The document aims to provide an insight on how to use Match transform in BODS 4.1.

The document  also helps to understand various options available in Match transform of BODS and their use.

Match transform is responsible for performing matching based on the business rules we define.

The transform then sends matching and unique records on to the next transform in the data flow.

For using Match transforms,following steps should be followed .

    Steps:

  • Place the Input table in the Designer window.
  • Select NameAddress_Match transform under Data Quality transforms

Capture2.PNG

  • Map  ALL the input fields, passing valid values to Person_given_name1,Address Primary Name and Postal code  especially .

For the fileds,which  does not have any valid input,pass Name1 as the input.

Capture3.PNG

  • Drag important columns from Input table i.e. Source table  to Output.Such as Customer/Vendor Id,

Customer Name, Customer Country to identify the record.

Capture4.PNG

· Click on Options Tab and Click on Editing options to edit the options.

Capture5.PNG

·  After selecting Edit option, Match editor will open.

Capture1.PNG

·  Follow the below steps while populating individual fields.

·  Option Description

Option

Description

Perform matching

Select to add the ability to compare records and other match-related operations.

This option is not selected only in the Base_Match transform configuration.

For all other configurations, this option is selected.

In our case, we will select the option.

Match set name

Enter a name for this match set. Each Match transform in your data flow

represents a match set.

This option is already populated with name you chose if you used the Match

wizard to generate this transform.

Be sure that this name is unique within the dataflow (it does not match the

name of another match set).

This name is used in the match reports to differentiate data processed by one

match set versus another.

Match engine

Specifies the match engine to use, based on the type of data you will be processing. If you use the Multinational strategy in the Match wizard, this option is set to Latin1 for all match sets.

Chinese: Specifies that the Match transform will be processing Chinese data

in Chinese script.

Japanese: Specifies that the Match transform will be processing Japanese

data in Japanese script.

Korean: Specifies that the Match transform will be processing Korean data

in Korean script.

Latin1: Specifies that the Match transform will be processing Latin1 data. In

general, this is the data used throughout the Americas, Western Europe,

Oceania, and much of Africa.

Other_Non_Latin1: Specifies that the Match transform will be processing

non-Latin1 data, other than Chinese, Japanese, Korean, and Taiwanese,

such as Russian, Greek, Hebrew, Arabic, and others.

Taiwanese: Specifies that the Match transform will be processing Taiwanese

data in Taiwanese script.

For optimum accuracy and performance, be sure that you have filtered your

multinational data to separate match transforms with the appropriate match

engine selected. The Match wizard can do this for you easily, if you select

the Multinational strategy.

By default, the Latin1 match engine is used. If you attempt to process nonlatin1

data with the Latin1 engine, the results are unpredictable.

Generate report data

Specifies whether to generate report data for this transform. This option is

available in every transform that generates report data.

Yes: Generates report data for this transform.

No: Turns off report data generation. If you do not need to generate reports

(during testing of dataflows, for example), you should set this option to No.

This will improve match performance.

In our case,it will be kept as NO

Logical source field

Specifies the field that contains the ID for the logical source.It will be kept Blank in our case.

Physical source field

Specifies the field that contains the ID for the physical source (Reader). It will be kept Blank in our case.

Run as a separate process

Yes: Splits the transform into a separate process.

No: Keeps the transform in same process as the rest of the dataflow.

In our case ,it should be NO

· Follow the screenshot.

Capture6.PNG

· Select to perform data salvaging.

If two records match, data salvaging temporarily copies data from a passenger record

to the driver record after comparing the two records. The data that’s copied is data

that is found in the passenger record but is missing or incomplete in the driver record.

Data salvaging prevents blank matching or initials matching from matching records

that you may not want to match.

In our case, we will not select Data Salvaging

· Group forming allows you to group and prioritize records for better match accuracy and efficiency.

Break groups

Break groups allow you to group records based on common field values (for example, postal code).

Only records that share the same break group value will be compared with one another.

Use break groups to lower the number of comparisons needed and to increase the speed of the matching process.

In our case ,we will not change any values.

Capture7.PNG

· In Matching  Option, Click on Match Criteria, to add the fields that are taken as the base fields for Matching ,by assigning a weightage to each field, summing up to 100.

  

Match score

Specifies the minimum similarity score needed for the records to be considered

a match based on this criteria. Type a value from 0 to 101.

A value of 101 ensures that this criteria alone is not enough to consider two

records a match and that you want to consider other criteria in the comparison process.

For example, a value of 90 means that you consider this data to be important

enough that if the data in two records is 90% similar or higher, the records are

considered a match.

     

No match score

Specifies the maximum similarity score needed for the records to be considered

a no-match based on this criteria. Type a value from -1 to 100.

A value of -1 ensures that this criteria is not enough to consider two records

a no-match and that you want to consider other criteria in the comparison  process.

For example, a value of 49 means that if the similarity between the data in two

records is less than 50%, the records do not match.

Also put the score at which the record , should be classified as a duplicate and unique.

In our case, we have taken Person name 1 and  Street as the fields to be matched,with   match  score as 80.

Capture8.PNG

Click on apply, after making changes.

·         Click on Output tab, to select the output fields.

Capture9.PNG

·         Select Person_given_name1, Person_given_name2, Address_Primary_name1,Postcode1.

·         For rest of output fields, please find below explanation ,to guide yourself.

Field name

Description

Group_Num

Specifies the records that belong to the same match group, which share

the same group number. The group numbers start with the number 1.

Unique records have a blank group number. If you are using association

matching in your job, you need to map this on output, because the Associate

transform uses it.

Match_Criterion

Specifies the name of the criteria that made the decision ( if the Match_Type

None is R). Otherwise, the field is blank.

Match_Level

Specifies the name of the match level used.

Match_Status

The values for the Match_Status field that appear in your output are:

D: This record is a driver in a match group.

P: This record is a passenger in a match group.

U: This is a unique record.

Match_Type

Describes how each record is identified as a match. Possible values are:

<blank>: The record did not match any other record. It is a unique record.

D: The record was the driver record in the comparison process.

R: The record was identified as matching the driver record because one

of the criteria met the Match_Score.

W: The record was identified as matching the driver because the total

weighted score met the Weighted match score. See Matching methods in

the Data Services Designer Guide for more information on weighted scores.

Match_Score

The Match_Score field outputs the following values:

The criteria similarity score when the Match_Type is R.

The total weighted score when the Match_Type is W.

Blank if the record is a driver record (Match_Type of D) or if the records

are unique.

Group_Count

Provides the total number of records in the match group.

Unique records have a value of 1.

Group_Order

The master record receives a value of 1. Subordinate records receive a value of 2

through the number of records in the match group.

You may control the order by including a Group Prioritization in the Post Match Operations.

Unique records have a value of 0.

Group_Rank

Specifies whether the record is a master (M) or a subordinate (S). Unique records have

an empty value.

Conclusion:

In case of Duplication in Source data, Match transform provides an approach to classify the data in Unique and Duplicate records.

We need to know about the duplication criteria, as well a match score should be set to nominate a record as a duplicate.

For best results, the data in which you are attempting to find matches should be cleansed. Therefore, we may need to include other Data Quality transforms before the Match transform.

Hope this document makes life easier to use Match transform or to perform deduplication across system.

Regards,

Mayank Mehta

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

Leave a Reply