I have created a POC to demonstrate the below said scenario (Concatenated Address) using Match transform and the details are given below.
The Address columns were concatenated (Address1+Address2+City+Region+PostCode) and passed on to match transform. Fuzzy logic Settings (Remove punctuation, Convert to uppercase, Check for transposed letters) were configured appropriately.
Sample Data:
FirmName | Address1 | Address2 | City | Region | PostCode | Country |
SAMSUNG | 250 2-GA TAEPYONG-RO JUNG-GU |
| SEOUL |
| 100-742 | KR |
SAMSUNG Corp | 250 2-GA TAEPYONG-RO JUNG-GU |
| SEOUL |
| 100-742 | KR |
SAMSUNG India | 250 2-GA TAEPYONG-RO JUNG-GU |
| SEOUL |
| 1000-742 | KR |
SAMSUNG India Pvt Ltd | 250 2-GA TAEPYONG-RO dummy |
| s |
| 100-742 | US |
SAMSUNG Corporation | 250 2-GA TAEPYONG-RO test |
| SOUL |
| 100-742 | UK |
Concatenated Address:
(DQ_Firm_Modified.Address1||' '||DQ_Firm_Modified.Address2||' '||DQ_Firm_Modified.City||' '||DQ_Firm_Modified.Region||' '||DQ_Firm_Modified.PostCode)
Concatenated Address |
250 2-GA TAEPYONG-RO JUNG-GU SEOUL 1000-742 |
250 2-GA TAEPYONG-RO JUNG-GU SEOUL 100-742 |
250 2-GA TAEPYONG-RO dummy s 100-742 |
250 2-GA TAEPYONG-RO test SOUL 100-742 |
250 2-GA TAEPYONG-RO JUNG-GU SEOUL 100-742 |
Match Criteria:
Match Transform Results:
ADDRESS_PRIMARY_NAME | ADDR_ADDRESS_GROUP_NUMBER | ADDR_ADDRESS_MATCH_STATUS | ADDR_ADDRESS_MATCH_LEVEL | ADDR_ADDRESS_MATCH_CRITERION | ADDR_ADDRESS_MATCH_SCORE | ADDR_ADDRESS_MATCH_TYPE | ADDR_ADDRESS_ADDRGROUPSTATS_GROUP_COUNT | ADDR_ADDRESS_ADDRGROUPSTATS_GROUP_ORDER | ADDR_ADDRESS_ADDRGROUPSTATS_GROUP_RANK |
250 2-GA TAEPYONG-RO JUNG-GU SEOUL 1000-742 | 1 | D | Address |
|
| D | 5 | 1 | M |
250 2-GA TAEPYONG-RO JUNG-GU SEOUL 100-742 | 1 | P | Address |
| 100 | W | 5 | 2 | S |
250 2-GA TAEPYONG-RO dummy s 100-742 | 1 | P | Address |
| 100 | W | 5 | 3 | S |
250 2-GA TAEPYONG-RO test SOUL 100-742 | 1 | P | Address |
| 100 | W | 5 | 4 | S |
250 2-GA TAEPYONG-RO JUNG-GU SEOUL 100-742 | 1 | P | Address |
| 100 | W | 5 | 5 | S |
By looking at the Match Score and Match Group Number, we can make out that fuzzy logic works fine here in this scenario. Also note that I have not used Firm Name in any criteria for comparison.
But when I completely mess up the data just like below, the transform is unable to find the match. We have to be very cautious about this and only cleansed data should be given as Input to this transform.
FirmName | Address1 | Address2 | City | Region | PostCode | Country |
SAMSUNG Corp | 2510 2-GA TAEdddPYONG-RO JUNGrr-GU |
| SEooOUL |
| 1010-7420 | KRA |
SAMSUNG India | 2550 2-11GA TAEaaaPYONG-RO JUsNGssd-G2U |
| SEOUuuLa |
| 10100-742 | KR |
SAMSUNG India Pvt Ltd | 2501 2-GA TAEPYONG-RO dummy |
| s |
| 10001-742 | US |
SAMSUNG Corporation | 2505 2-GA TAEPYONG-RO test |
| SOUL |
| 100-742 | UK |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
37 | |
10 | |
5 | |
4 | |
4 | |
3 | |
3 | |
3 | |
2 | |
2 |