Skip to Content

Matching Methods:

There are three matching methods are available in Data Services match transform for setting up the match criteria.

1.     Rule Based method

2.     Weighted Scoring method

3.     Combination method

Match Method

Description

Rule Based

Allows controlling which criteria determines the match.

Weighted Scoring

Allows assigning importance or weight to any criteria.

Combination Method

Combines the rule-based and weighted scoring methods of matching.

1.    Rule Based Method:

With rule-based matching, we rely only on our match and no-match scores to determine matches within the criteria.

The following example shows how to set up this method in the Match transform.

Criteria

Record A

Record B

No Match Score

Match Score

Similarity Score

First Name

Mary

Mary

82

101

100

Last Name

Smith

Smitt

74

101

80

E mail

msmith@abc.com

mary.smith@abc.com

79

80

91

By entering a value of 101 in the Match score for every criterion except the last, the First Name and Last Name criteria never determine a match

By setting the Match and No-Match score for the E-mail criteria with no gap, any comparison that reaches the last criteria must either be a match or a no-match.

A match score of 101 ensures that the criterion does not cause the records to be a match, because two fields cannot be more than 100 percent alike.

In rule-based match scenarios, we should avoid gaps between the Match Score and No Match Score.

In the output file under “Match Type” we can find values D and R.

2.    Weighted Scoring Method:

In a rule-based matching method, the application gives all of the criteria the same amount of importance (or weight). That is, if any criterion fails to meet the specified Match-Score, the application determines that the records do not match. When we use the weighted scoring method, we are relying on the total contribution score for determining matches, as opposed to using match and no match scores on their own.

Contribution Values:

           

Contribution values are our way of assigning weight to individual criteria. The higher the value, the more weight that criterion carries in determining matches. In general, criteria that might carry more weight than others include account numbers, Social Security numbers, customer numbers, Postcode1, and addresses.

Total 100. All contribution values for all criteria that have them must total 100. We do not need to have a contribution value for all of our criteria.

We can define a criteria’s contribution value in the Contribution-To-Weighted-Score option in the Criteria-Definition option group.

Contribution and total contribution score

The Match transform generates the contribution score for each criteria by multiplying the contribution value we assign with the similarity score (the percentage alike). These individual contribution scores are then added to get the Total contribution score.

Weighted Match Score

In the weighted scoring method, matches are determined only by comparing the total contribution score with the weighted match score. If the total contribution score is equal to or greater than the weighted match score, the records are considered a match. If the total weighted score is less than the weighted match score, the records are considered a no-match.

We can set the weighted match score in the Weighted Match Score option of the Criteria Match Spec option group.

Weighted scoring example

The following table is an example of how to set up weighted scoring. Notice the various types of scores that we have discussed. Also notice the following

·         When setting up weighted scoring, the No Match Score must be set to -1, and the Match Score must be set to 101. These values ensure that neither a match nor a no-match can be found by using these scores

·         We have assigned a contribution value to the E-mail criteria that give it the most importance.

Criteria

Record A

Record B

No Match
Score

Match
Score

Similarity
Score

Contribution Value

Contribution Score
actual similarity x contribution value)

First Name

Mary

Mary

-1

101

100

25

25

Last Name

Smith

Smitt

-1

101

80

25

20

Email

msmith@abc.com

mary.smith@abc.com

-1

101

91

50

46

Total Contribution
Score

91

In this example, the total contribution score is 91. If the weighted match score is 90 or less, the records are considered a no-match.

In the output file under “Match Type” we can find values D and W.

3.    Combination Method:

This strategy combines the rule-based and weighted scoring methods of matching.

·         A no-match can be determined by the similarity score of any criteria not equaling or exceeding the no-match score. However, a match cannot be determined by the match score (we must have a match score of 101).

·         A match can be determined only by comparing a total contribution score with the weighted match score.

Criteria

Record A

Record B

No Match
Score

Match
Score

Similarity
Score

Contribution Value

Contribution Score
actual similarity x contribution value)

First Name

Mary

Mary

59

101

100

25

25

Last Name

Smith

Hope

59

101

22

N/A
(No Match)

N/A

Email

msmith@abc.com

mary.hope@abc.com

49

101

N/A

N/A

N/A

Total Contribution
Score

N/A

To report this post you need to login first.

22 Comments

You must be Logged on to comment or reply to a post.

  1. Mourice Roggendorf

    Hello Durai,

    I just read your article about ‘Matching Methods in Data Services Match Transform’ and tested them.

    The question that I have is: How do I add/ get the ‘total contribution score’ within the weighted scoring method? (I already defined the contribution scores)

    Thanks in advance.

    Best regards,

    Mourice

    (0) 
    1. Venkata Ramana Paidi

      Hi Mourice,

      You have to assign contribution score for each criteria  and should not be greater than 100%.  You have to assign the weighted score on the level. If you click on the level  you will see the weighted score option.

      Thanks & Regards,

      Ramana.

      (0) 
    1. Venkata Ramana Paidi

      Hi Shazia,

      Similarity score calculated by match fuzzy logic based on match transform options.   Match score and no match score are the threshold values to considering the duplicates.

      Thanks & Regards,

      Ramana 

      (0) 
  2. Venkat Chowdary

    Hi Venkat Ramana,

    I am trying to find the duplicates by using the match transform.

    EX:Source

    Name

    Soumyadeep

    Soumyadeep

    Somyadip

    Ajay

    Ajoy

    In the above example

    • We are getting potential duplicates and unique records in MATCH_STATUS.
    • If the record is exactly matched then it is showing as 100% matched in SET_LEVEL_MATCH_SCORE but if the record is partially matched then it is not showing as how much % matched.

    Can you explain Comparison rules in match Transform

    Can you please help me that what exact match score it will get with an example of screen shots.

    Thanks & Regards

    Venkat

    (0) 
    1. Venkata Ramana Paidi

      Hi Venkat,

      Based the match score you will get the match status,match type etc.

      if you are doing duplicate check on single field and you are getting only 100% match records means  you have to change the match score limit.  If you are using the field similarity it will check the character by character match  and calculate the similarity score . If similarity score is greater than or equal to the match score  then it give the duplicate records with the similarity score else it will go the unique records. you have to change the below settings for this.

      Please share with me about your match transform settings screenshot like above for the your sample input data. I will check and if possible I will correct it.

      If you need more information please contact me through mail.

      Thanks & Regards,

      Ramana.

      (0) 
  3. Venkat Chowdary

    Hi

    In the above example how we can get exact (match score ,no match score and total contribution score)

    Can you please explain with screen shot or detail view.

    Regards,

    Venkat

    (0) 
  4. Venkat Chowdary

    Hi Venkat Ramana,


    I have gone through the screen shot which you shared with me but SET1_LEVEL_MATCH_SCORE is showing as 100% and i am not able to find exact match score.

    Can you please share the screen shot of which you are getting exact match score.

    Regards

    Venkat

    (0) 
    1. Venkata Ramana Paidi

      Hi Venkat,

      Try to change the match score to 65 from 95  .  no match score always should be less than  match sore . Remaining two match scores equals to no match score.

      Change the match score and run the job. I thought tour data similarity score is less than 95.

      Thanks & Regards,

      Ramana.

      (0) 
  5. Venkat Chowdary

    Hi Ramana,

    This is the sample source file.

    Source file:

    Name

    Soumyadeep Mitra

    Saumya Meetra

    Soumya

    Venkat Chowdhury

    Venkata Choudhri

    Venkat

    Ajay Gupta

    Ajoy Guptu

    Ajay

    Manish Sing

    Maneesh Singhas

    Regards

    Venkat

    (0) 
    1. Venkata Ramana Paidi

      Hi Venkat ,

      I didn’t notice your screenshots properly.

      There was a mistake in the match properties .

      You have given the break group  for input as shown below.

      /wp-content/uploads/2015/07/match_2_753495.png

      You have only one field  but you gave the group with length 11.

      This is almost equal to the your field values max length.

      break group creates the group based on length and check the duplicates with in the group .   In your case as you given the length 11 , each record is separate group until it has the exact match( 100%). With in the group finding the duplicates means here each group will contain only one record then it gives as unique record .

      Try to remove the break group .  For your scenario no need of any break group.

      Please find the match transform settings and results .

      for the field similarity.

      Result :

      In the above result , you missed few names like Ajay, Soumya because it has partial names .  In this case you can use word similarity it will compare word by word instead of character by character( Filed Similarity).

      Result :

      I hope you can understand .

      If you need any clarification feel free to contact me .

      Thanks & Regards,

      Ramana.

      (0) 
  6. Venkat Chowdary

    Hi Ramana,

    If suppose we are having fields like

    Name                    Address           Email

    Venkat                   hyd               abc@ibm.com

    VenkatRao             hyderabad     abc@cts.com

    soumyadeep          mumbai         def@wipro.com

    Soumyadip            delhi              pqr@cgi.com

    soumya                 kolkata          xyz@pwc.com

    Ruthvick               bang               gdh@cts.com

    Sathvick               chennai          aaa@bbb.com

    – what is the exact match score.

    – Can you please share the screen shot of MATCHING LEVEL    in Match Transform

    Capture0.PNG

    (0) 
    1. Venkata Ramana Paidi

      I am not understanding .  What you require .

      What is the meaning of the question ?

      – what is the exact match score.

      – Can you please share the screen shot of MATCHING LEVEL    in Match Transform



      (0) 
  7. Venkat Chowdary

    Hi Ramana,

    I want to find out the match score in the above 3 field names for finding the duplicates.

    I  have done some examples but it is not showing the exact match score.

    Lets Showing some outputs which i was facing.Capture1.PNG

    In the above output

    – “Beach” ,”Jasch” and “Zacch” are different names but it is generating as single group       in set 1_Level_Group_Number. (2)

    – “JPMOR”  and “MORGA” are different names but it is generating as single group       in set 1_Level_Group_Number. (6)

    – “Power” and “Moder” are different names but it is generating as single group       in set 1_Level_Group_Number. (7).



    Can you please Share the MATCHING LEVEL options in in Match Transform

    Regards

    Venkat

    (0) 
  8. Venkat Chowdary

    Hi Ramana,

    Can you please share the Match Transform Screen Shots what ever the modification you have done for finding the duplicates by using Multiple Field Names.

    Regards

    Venkat

    (0) 
    1. Venkata Ramana Paidi

      Hi Venkat ,

      You were giving two contradictory things .

      1. First You have mentioned that you need to find the duplicates on three fields on the above data.  As per your

      Name                    Address           Email

      Venkat                   hyd               abc@ibm.com

      VenkatRao             hyderabad     abc@cts.com

      soumyadeep          mumbai         def@wipro.com

      Soumyadip            delhi              pqr@cgi.com

      soumya                 kolkata          xyz@pwc.com

      Ruthvick               bang               gdh@cts.com

      Sathvick               chennai          aaa@bbb.com

      2. In the below you gave some other output and asking for improper result.

      But it is more confusing me .

      Finding duplicates on the single filed is different. If you want do match transform on multiple fields you can merge the fields and find the duplicates .  If you want to use weighted score you can find the duplicates on multiple fields. Normally it will give duplicates for the first criteria only.

      The other question you were asking why we are getting wrong results means you are using the match score is like that you were getting .

      If you are using field similarity it will check character by character and calculate the similarity score. For your match score may be greater than or equal to similarity score.

      “JPMOR”  and “MORGA” are different names but it is generating as single group       in set 1_Level_Group_Number. (6)


      In this case 3 characters matching and similarity score around 60% .


      Match transform is not straight forward transform unlike other integrator transforms or platform transforms. You have to study your requirements and you have to do .


      Other thing you were asking my screenshots. I didn’t understand that one also . Why you need screenshots for my match.  Match options completely depend on requirements. That day you gave different things  and I worked on that and shared the screenshots. Those screenshots will not workout for other as requirements are completely different. 


      last post requirement is same as what I gave it you before .  Input is different . requirements may vary . According to that we need to change the options . But you should understand the match options first before doing development.First go through technical manual with examples .


      Thanks & Regards,

      Ramana.



      (0) 
      1. Venkat Chowdary

        Hi,

        We have created a real time scenario where we are comparing data from web service (nested XML) and CRM tables data and generating Unique and duplicate Records using Base_Match transform of Data Quality.

        Job is validated successfully , but when we are executing the job with sample data , we are getting below error:


        <Table> for column <fieldname> doesn’t occur in the FROM clause, for a top level query all columns must belong to some table in the from clause


        Please help us in sorting this error and let us know if any other information is required from our end

          Attached documents    

        1.Full job1.PNG



        2.Data from XML file

        2.PNG

        3.Data from CRM table

        3.PNG

        Joining condition as :Query.nameSur=Merge.nameSur and Query.ElectronicAddress.eAddress=Merge.eAddress


        Thanks,

        Venkat


        (0) 

Leave a Reply