SAP Data Duplication – Data services BODS Match Vs Syniti Match
Today I would like to compare the 2 ETL tools in Data Quality –
- SAP Data Services – BODS and
- Syniti ADM
for Data Duplication.
Note: I am bringing insights based on my experiences through these tools. This could be a debate, let’s have a discussion 😉
let’s start with our BODS Match technology:
In BODS, under data quality, you can find the Base Match transformation:
We must connect to the source and apply Match Key through this Match transformation and get duplicates…
Syniti Match covers this traditional way of finding duplicates and, as well covers a bit advanced like Antony called Tony with a nickname, Syniti will try to recognize these kinds of data and report.
Syniti’s matching technology delivers better results more quickly when compared to conventional solutions.
|Conventional BODS Matching||Syniti Match|
|Requires match-ready data which needs significant preprocessing, such as standardized data with consistent schemas.||No preprocessing is needed. Bring your data as it is.|
|Requires significant manual effort from SMEs to assess and remediate data quality problems.||Automates manual interpretations, recognizing patterns, non-Latin characters, and spelling differences.|
|Users must understand the nuances of off-the-shelf algorithms.||Utilizes a proprietary phonetic algorithm, specifically built for contact and business data.|
|Requires data coders or data scientists to implement matching.||Create custom matching using a friendly interface with easy drag-and-drop functionality.|
|Processing is slow.||Processing occurs in minutes instead of hours or days.|
|Matchkeys are the basis for comparison, which reflects errors in the data.||Contextual scoring mirrors human-like perception and is much more accurate.|
For example, the following does NOT produce a match in conventional tools; comparison is based on the match key. Syniti’s Match produces a match for all three records.
MATCHKEY: First_Name (3) + Last_Name (3) + Street_Number (4) + ZIP(5)
|TAMMAY350078746||Tamas Mayer||3500 N Capital of Texas Hwy #230||AUSTIN||TX||78746|
|TOMMOO350078746||Tom Moore||3500 N Capital of Texas Hwy #502||AUSTIN||TX||78746|
|TMOO35078746||Mr. T R Moore||3500 N Capital of Texas Hwy #502||AUSTIN||TX||78746|
Syniti’s Matching Technology
Match uses the following technologies as it processes your data to find and score records that are possible matches.
When data enters the matching engine, the first step is breaking it into multiple fields. To do this, Match:
- Splits up the name.
- Pulls company out of address.
- Parses concatenated addresses, and so on.
- Pattern Recognition
- Prefixes and suffixes, such as DR or JR.
- Business words such as INC, LLC, or DBA.
- Context, such as street, suite, flat, and so forth.
- Abbreviations, such as Mfg for Manufacturing or ACCT for Accounting.
- Nicknames, such as Tony vs Anthony.
Match converts global Unicode characters, such as Chinese, into English-Latin characters.
In this example, the Chinese character 昌 means prosperous and is pronounced change, and the Chinese character 李 means plum and is pronounced li.
diagram example of transliteration
4. Phonetic Algorithm
Now that Match has isolated values into separate fields, like first name, last name, company, street, and city, you can generate phonetic translations on these fields to help circumvent errors.
For example, the name Naugton could be misspelled or typed incorrectly, and could likely be the same as a record with the name Naughton.
Many business databases have a massive quantity of records. To facilitate working at this scale, Match makes a pass at the data and identifies similar records, creating Candidate Groups.
This recognizes similar records based on multiple datapoints.
Match is not finding matches at this point but is simply identifying good candidates for further comparison. Match can then use these groups to locate records that match but have nothing exactly in common.
For example, Match could look for records with:
- Last names that match phonetically and the same zip code.
- Or, last names and street names that match phonetically.
diagram example of grouping
6. Contextual Scoring
Once Match has aligned data by Candidate Groups, it performs scoring. It compares two records at a time and grades them for similarities.
- It compares and scores multiple fields individually, such as name, company, address, zip, phone, email, and so on.
- It establishes an overall similarity score between the two records.
- The higher the score is, the more confident the system thinks it’s a match.
- You specify the score threshold and Match presents any records that score above the threshold as a match.
diagram example of contextual scoring
Considering the advanced features like contextual scoring, Grouping, and transliterating, I give a few more marks to Syniti Matching
In the next blog posts, we can discuss more processing.
That’s all about this blog post.
Thanks for reading, please provide your feedback. ?
Happy Learning, see you in my next blog 🙂