Skip to Content
Author's profile photo Former Member

Matching Methods in Data Services Match Transform

Matching Methods:

There are three matching methods are available in Data Services match transform for setting up the match criteria.

1.     Rule Based method

2.     Weighted Scoring method

3.     Combination method

Match Method

Description

Rule Based

Allows controlling which criteria determines the match.

Weighted Scoring

Allows assigning importance or weight to any criteria.

Combination Method

Combines the rule-based and weighted scoring methods of matching.

1.    Rule Based Method:

With rule-based matching, we rely only on our match and no-match scores to determine matches within the criteria.

The following example shows how to set up this method in the Match transform.

Criteria

Record A

Record B

No Match Score

Match Score

Similarity Score

First Name

Mary

Mary

82

101

100

Last Name

Smith

Smitt

74

101

80

E mail

msmith@abc.com

mary.smith@abc.com

79

80

91

By entering a value of 101 in the Match score for every criterion except the last, the First Name and Last Name criteria never determine a match

By setting the Match and No-Match score for the E-mail criteria with no gap, any comparison that reaches the last criteria must either be a match or a no-match.

A match score of 101 ensures that the criterion does not cause the records to be a match, because two fields cannot be more than 100 percent alike.

In rule-based match scenarios, we should avoid gaps between the Match Score and No Match Score.

In the output file under “Match Type” we can find values D and R.

2.    Weighted Scoring Method:

In a rule-based matching method, the application gives all of the criteria the same amount of importance (or weight). That is, if any criterion fails to meet the specified Match-Score, the application determines that the records do not match. When we use the weighted scoring method, we are relying on the total contribution score for determining matches, as opposed to using match and no match scores on their own.

Contribution Values:

           

Contribution values are our way of assigning weight to individual criteria. The higher the value, the more weight that criterion carries in determining matches. In general, criteria that might carry more weight than others include account numbers, Social Security numbers, customer numbers, Postcode1, and addresses.

Total 100. All contribution values for all criteria that have them must total 100. We do not need to have a contribution value for all of our criteria.

We can define a criteria’s contribution value in the Contribution-To-Weighted-Score option in the Criteria-Definition option group.

Contribution and total contribution score

The Match transform generates the contribution score for each criteria by multiplying the contribution value we assign with the similarity score (the percentage alike). These individual contribution scores are then added to get the Total contribution score.

Weighted Match Score

In the weighted scoring method, matches are determined only by comparing the total contribution score with the weighted match score. If the total contribution score is equal to or greater than the weighted match score, the records are considered a match. If the total weighted score is less than the weighted match score, the records are considered a no-match.

We can set the weighted match score in the Weighted Match Score option of the Criteria Match Spec option group.

Weighted scoring example

The following table is an example of how to set up weighted scoring. Notice the various types of scores that we have discussed. Also notice the following

·         When setting up weighted scoring, the No Match Score must be set to -1, and the Match Score must be set to 101. These values ensure that neither a match nor a no-match can be found by using these scores

·         We have assigned a contribution value to the E-mail criteria that give it the most importance.

Criteria

Record A

Record B

No Match
Score

Match
Score

Similarity
Score

Contribution Value

Contribution Score
actual similarity x contribution value)

First Name

Mary

Mary

-1

101

100

25

25

Last Name

Smith

Smitt

-1

101

80

25

20

Email

msmith@abc.com

mary.smith@abc.com

-1

101

91

50

46

Total Contribution
Score

91

In this example, the total contribution score is 91. If the weighted match score is 90 or less, the records are considered a no-match.

In the output file under “Match Type” we can find values D and W.

3.    Combination Method:

This strategy combines the rule-based and weighted scoring methods of matching.

·         A no-match can be determined by the similarity score of any criteria not equaling or exceeding the no-match score. However, a match cannot be determined by the match score (we must have a match score of 101).

·         A match can be determined only by comparing a total contribution score with the weighted match score.

Criteria

Record A

Record B

No Match
Score

Match
Score

Similarity
Score

Contribution Value

Contribution Score
actual similarity x contribution value)

First Name

Mary

Mary

59

101

100

25

25

Last Name

Smith

Hope

59

101

22

N/A
(No Match)

N/A

Email

msmith@abc.com

mary.hope@abc.com

49

101

N/A

N/A

N/A

Total Contribution
Score

N/A

Assigned Tags

      22 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      Hello Durai,

      I just read your article about 'Matching Methods in Data Services Match Transform' and tested them.

      The question that I have is: How do I add/ get the 'total contribution score' within the weighted scoring method? (I already defined the contribution scores)

      Thanks in advance.

      Best regards,

      Mourice

      Author's profile photo Venkata Ramana Paidi
      Venkata Ramana Paidi

      Hi Mourice,

      You have to assign contribution score for each criteria  and should not be greater than 100%.  You have to assign the weighted score on the level. If you click on the level  you will see the weighted score option.

      Thanks & Regards,

      Ramana.

      Author's profile photo Former Member
      Former Member

      Hello Durai,

      I did not get Similarity Score concept . Could you please explore more about this.

      Regards,

      Shazia

      Author's profile photo Venkata Ramana Paidi
      Venkata Ramana Paidi

      Hi Shazia,

      Similarity score calculated by match fuzzy logic based on match transform options.   Match score and no match score are the threshold values to considering the duplicates.

      Thanks & Regards,

      Ramana 

      Author's profile photo Former Member
      Former Member

      Hi Venkat Ramana,

      I am trying to find the duplicates by using the match transform.

      EX:Source

      Name

      Soumyadeep

      Soumyadeep

      Somyadip

      Ajay

      Ajoy

      In the above example

      • We are getting potential duplicates and unique records in MATCH_STATUS.
      • If the record is exactly matched then it is showing as 100% matched in SET_LEVEL_MATCH_SCORE but if the record is partially matched then it is not showing as how much % matched.

      Can you explain Comparison rules in match Transform

      Can you please help me that what exact match score it will get with an example of screen shots.

      Thanks & Regards

      Venkat

      Author's profile photo Venkata Ramana Paidi
      Venkata Ramana Paidi

      Hi Venkat,

      Based the match score you will get the match status,match type etc.

      if you are doing duplicate check on single field and you are getting only 100% match records means  you have to change the match score limit.  If you are using the field similarity it will check the character by character match  and calculate the similarity score . If similarity score is greater than or equal to the match score  then it give the duplicate records with the similarity score else it will go the unique records. you have to change the below settings for this.

      Please share with me about your match transform settings screenshot like above for the your sample input data. I will check and if possible I will correct it.

      If you need more information please contact me through mail.

      Thanks & Regards,

      Ramana.

      Author's profile photo Former Member
      Former Member

      Hi

      In the above example how we can get exact (match score ,no match score and total contribution score)

      Can you please explain with screen shot or detail view.

      Regards,

      Venkat

      Author's profile photo Former Member
      Former Member

      Hi Venkat Ramana,


      I have gone through the screen shot which you shared with me but SET1_LEVEL_MATCH_SCORE is showing as 100% and i am not able to find exact match score.

      Can you please share the screen shot of which you are getting exact match score.

      Regards

      Venkat

      Author's profile photo Former Member
      Former Member

      /wp-content/uploads/2015/07/match_1_752070.png/wp-content/uploads/2015/07/match_2_752071.png/wp-content/uploads/2015/07/match_3_752106.png/wp-content/uploads/2015/07/match_4_752107.png/wp-content/uploads/2015/07/match_5_752109.png

      Author's profile photo Venkata Ramana Paidi
      Venkata Ramana Paidi

      Hi Venkat,

      Try to change the match score to 65 from 95  .  no match score always should be less than  match sore . Remaining two match scores equals to no match score.

      Change the match score and run the job. I thought tour data similarity score is less than 95.

      Thanks & Regards,

      Ramana.

      Author's profile photo Former Member
      Former Member

      Hi Ramana,

      I have tried with match score 65 and no match score 30 but it is still not getting the exact match score./wp-content/uploads/2015/07/match_6_752348.png

      Regards

      Venkat

      Author's profile photo Venkata Ramana Paidi
      Venkata Ramana Paidi

      Hi Venkat,

      Please provide me your sample data. I will try  and let you know .

      Thanks & Regards,

      Ramana

      Author's profile photo Former Member
      Former Member

      Hi Ramana,

      This is the sample source file.

      Source file:

      Name

      Soumyadeep Mitra

      Saumya Meetra

      Soumya

      Venkat Chowdhury

      Venkata Choudhri

      Venkat

      Ajay Gupta

      Ajoy Guptu

      Ajay

      Manish Sing

      Maneesh Singhas

      Regards

      Venkat

      Author's profile photo Venkata Ramana Paidi
      Venkata Ramana Paidi

      Hi Venkat ,

      I didn't notice your screenshots properly.

      There was a mistake in the match properties .

      You have given the break group  for input as shown below.

      /wp-content/uploads/2015/07/match_2_753495.png

      You have only one field  but you gave the group with length 11.

      This is almost equal to the your field values max length.

      break group creates the group based on length and check the duplicates with in the group .   In your case as you given the length 11 , each record is separate group until it has the exact match( 100%). With in the group finding the duplicates means here each group will contain only one record then it gives as unique record .

      Try to remove the break group .  For your scenario no need of any break group.

      Please find the match transform settings and results .

      for the field similarity.

      Result :

      In the above result , you missed few names like Ajay, Soumya because it has partial names .  In this case you can use word similarity it will compare word by word instead of character by character( Filed Similarity).

      Result :

      I hope you can understand .

      If you need any clarification feel free to contact me .

      Thanks & Regards,

      Ramana.

      Author's profile photo Former Member
      Former Member

      Hi Ramana,

      Now i am getting the Match_Score.

      Thanks for your support.

      Regards

      Venkat

      Author's profile photo Former Member
      Former Member

      Hi Ramana,

      If suppose we are having fields like

      Name                    Address           Email

      Venkat                   hyd               abc@ibm.com

      VenkatRao             hyderabad     abc@cts.com

      soumyadeep          mumbai         def@wipro.com

      Soumyadip            delhi              pqr@cgi.com

      soumya                 kolkata          xyz@pwc.com

      Ruthvick               bang               gdh@cts.com

      Sathvick               chennai          aaa@bbb.com

      - what is the exact match score.

      - Can you please share the screen shot of MATCHING LEVEL    in Match Transform

      Capture0.PNG

      Author's profile photo Venkata Ramana Paidi
      Venkata Ramana Paidi

      I am not understanding .  What you require .

      What is the meaning of the question ?

      - what is the exact match score.

      - Can you please share the screen shot of MATCHING LEVEL    in Match Transform



      Author's profile photo Former Member
      Former Member

      Hi Ramana,

      I want to find out the match score in the above 3 field names for finding the duplicates.

      I  have done some examples but it is not showing the exact match score.

      Lets Showing some outputs which i was facing.Capture1.PNG

      In the above output

      - "Beach" ,"Jasch" and "Zacch" are different names but it is generating as single group       in set 1_Level_Group_Number. (2)

      - "JPMOR"  and "MORGA" are different names but it is generating as single group       in set 1_Level_Group_Number. (6)

      - "Power" and "Moder" are different names but it is generating as single group       in set 1_Level_Group_Number. (7).



      Can you please Share the MATCHING LEVEL options in in Match Transform

      Regards

      Venkat

      Author's profile photo Former Member
      Former Member

      Hi Ramana,

      Can you please share the Match Transform Screen Shots what ever the modification you have done for finding the duplicates by using Multiple Field Names.

      Regards

      Venkat

      Author's profile photo Venkata Ramana Paidi
      Venkata Ramana Paidi

      Hi Venkat ,

      You were giving two contradictory things .

      1. First You have mentioned that you need to find the duplicates on three fields on the above data.  As per your

      Name                    Address           Email

      Venkat                   hyd               abc@ibm.com

      VenkatRao             hyderabad     abc@cts.com

      soumyadeep          mumbai         def@wipro.com

      Soumyadip            delhi              pqr@cgi.com

      soumya                 kolkata          xyz@pwc.com

      Ruthvick               bang               gdh@cts.com

      Sathvick               chennai          aaa@bbb.com

      2. In the below you gave some other output and asking for improper result.

      But it is more confusing me .

      Finding duplicates on the single filed is different. If you want do match transform on multiple fields you can merge the fields and find the duplicates .  If you want to use weighted score you can find the duplicates on multiple fields. Normally it will give duplicates for the first criteria only.

      The other question you were asking why we are getting wrong results means you are using the match score is like that you were getting .

      If you are using field similarity it will check character by character and calculate the similarity score. For your match score may be greater than or equal to similarity score.

      "JPMOR"  and "MORGA" are different names but it is generating as single group       in set 1_Level_Group_Number. (6)


      In this case 3 characters matching and similarity score around 60% .


      Match transform is not straight forward transform unlike other integrator transforms or platform transforms. You have to study your requirements and you have to do .


      Other thing you were asking my screenshots. I didn't understand that one also . Why you need screenshots for my match.  Match options completely depend on requirements. That day you gave different things  and I worked on that and shared the screenshots. Those screenshots will not workout for other as requirements are completely different. 


      last post requirement is same as what I gave it you before .  Input is different . requirements may vary . According to that we need to change the options . But you should understand the match options first before doing development.First go through technical manual with examples .


      Thanks & Regards,

      Ramana.



      Author's profile photo Former Member
      Former Member

      Hi,

      We have created a real time scenario where we are comparing data from web service (nested XML) and CRM tables data and generating Unique and duplicate Records using Base_Match transform of Data Quality.

      Job is validated successfully , but when we are executing the job with sample data , we are getting below error:


      <Table> for column <fieldname> doesn't occur in the FROM clause, for a top level query all columns must belong to some table in the from clause


      Please help us in sorting this error and let us know if any other information is required from our end

        Attached documents    

      1.Full job1.PNG



      2.Data from XML file

      2.PNG

      3.Data from CRM table

      3.PNG

      Joining condition as :Query.nameSur=Merge.nameSur and Query.ElectronicAddress.eAddress=Merge.eAddress


      Thanks,

      Venkat


      Author's profile photo Venkata Ramana Paidi
      Venkata Ramana Paidi

      Hi Venkat,

      Check the from class of the query_11 transform and check merge and query schema is there or not?

      Thanks & Regards,

      Ramana.