In my previous project we had a requirement of running matching strategy for finding duplicate Material master. The matching rule was pretty simple based on Material Description but the repository size was in tune of 500K with around 250 attributes.
to enable better matching results we had used lots of transformations on Material Description field for normalizing and standardizing of data. We were using combination of two rules in the Duplicate matching strategy:
1. Rule1: using functionality equal(on Material Description transformed field)
2. Rule 2: token equal( on Material Description field)
On an average the description was having 1.5 tokens per record. We found that performance was very poor as we had both token equals and a huge list of transformations( few of them were replacing string XXX to AAA, deleting blanks, other special characters etc etc). Even after restricting the total number of records considered for matching( we used Material group for Clustering) it was taking 20-40 mins for matching results.
Solution: We improved the data quality by re importing the same set of records which was used for initial load from ECC but this time MDM Import manager capabilities were harnessed to reduce number of transformations.
HOW: After mapping Material Description field apply value conversion filter on the mapped field and we have almost all excel based powerful functions available like replace, Append, Prepend. We can have even multiple such conversion rules applied on the same mapped field.
We also made Keyword normal for field Material Description, which optimized token based comparisons.
This effectively reduced our number of transformations from 22 to 5 and increased matching strategy performance.