De-duplication of Master Data during large SAP Implementation Projects

De-dup.jpg

Abstract

During large SAP Implementation project, which happens by consolidating one or more SAP and non-SAP systems, there is a high possibility of same material master and vendor master to re-appear in the target system with different names and details. Incorrect master data leads to various issues like incorrect reporting. This whitepaper addresses the issue of duplicate records and provides solution on how to eliminate them. This whitepaper delves into the advantages of de-duplication, explains the process steps to execute de-duplication along with information on fields which should be used to pick the duplicate records, suggests the best team structure to manage de-duplication projects and provides a guide explaining the common pitfalls and the mitigation plans while executing de-duplication projects.

Introduction

Performing master data cleansing at source systems (SAP and non-SAP systems) and at the intermediary stage prior to final data upload into the target system is a major activity in SAP implementation and rollout projects. This is a key activity while consolidating one or more ERP systems. The primary activity in master data cleansing is de-duplication of master data. De-duplication of master data refers to the process of finding the identical master data within or across the source system(s) and eliminating them before migrating to the target system. Master data here typically refers to material master data and vendor master data.

There are several tools used in de-duplication projects. The capabilities of such tools are not discussed in this whitepaper.

The aim of this whitepaper is to provide key information of the de-duplication process which could be followed in SAP rollout and implementation projects.

Overview of de-duplication process

The following picture depicts the de-duplication process

Overview.jpg

Figure 1: Overview of de-duplication process

The de-duplication process comprises following steps:

  1. Initial cleansing of source data
  2. Source data comparison based on de-duplication logic specific to master data and preparation of de-duplication reports
  3. Nomination of leading master data
  4. Mapping of non-leading master data to leading master data
  5. New master data creation in the target system

Advantages of de-duplication process

The advantages of executing de-duplication process during rollout/implementation projects are listed as follows:

  1. Duplicate material and vendor master records leads to incorrect and inconsistent reporting
  2. Incorrect consumption and availability information of material master leads to inaccurate material planning
  3. De-duplication process adhered during the start of the rollout/implementation projects reduces the time and cost spent in manually identifying duplicates later
  4. De-duplication process improves the overall reliability of material and vendor reports & analysis
  5. Removing duplicate vendor master records helps to maintain effective and consistent communication with vendors
  6. Consolidated, consistent, harmonized and cleansed master data are pre-requisites for innovation and growth

Execution of de-duplication process

De-duplication process gets executed as described in following steps:

Initial cleansing of source data

The scope for de-duplication of master data comprises all the master data in the source systems except those which would fall under one or more below criteria:

a.       Master data which are deleted or blocked at the highest level in the organisational structure. However master data which are blocked at one of the lower organisational structure level might still be active and relevant at another organisational level, hence those data should still be considered for de-duplication exercise.

b.       Vendor master data which are not created at company code level but created only at purchasing organisational level for different purposes

c.        Master data which cannot be migrated to target system due to non-availability of mapping values on important fields like unit of measure in material master

The initial cleansing of source master data is important since this would dramatically reduce the number of duplicated group of master records. Initial cleansing would include both enrichment of key master data and performing corrections in the master data. Predominantly, initial cleansing will be performed on the data within the source system only.

Few examples of initial cleansing of source master data are as follows:

a.       Correct the material master with dummy text

b.       Correct material master with material type HERS duplicated with the material type HIBE to which the HERS material master is assigned to

c.       Update key details used in de-duplication logic which are missing in the material and vendor master data

d.       Check and correct redundant partner functions created in vendor master

Important fields which are focused during initial cleansing or enrichment within source system in material master and vendor master are as follows:

Material Master:

a.       Material description

b.       Unit of measure

c.       Manufacturer details

d.       UNSPSC code

e.       Vendor part number

Vendor Master:

a.       Vendor name

b.       Address

c.       VAT registration number

d.       DUNS

e.       Bank account number

Source master data comparison

This step is the core of de-duplication process. After the data is cleansed and enriched, it would be compared against each other. The result of this comparison would be the grouping of similar master data. The source system for the master data could be single ERP system or multiple ERP systems. There are tools available for comparing the master data and creating the groups of similar master data.

The de-duplication tool applies the de-duplication logic in order to identify similar master data and develops the master data de-duplication table.

Rules used in de-duplication logic

The criteria used to determine the group of similar master data would depend on many factors like the availability of data, level of initial cleansing done, scope of enrichment performed within source data, etc.,

Some of the rules used in material master de-duplication logic in order to identify the similar group of material masters are as follows:

a.       Same manufacturer and same base unit of measure

b.       Same UNSPSC code

c.       Same manufacturer and same vendor part number

d.       Similar description

Predominantly the details would be concatenated and text comparisons are performed in order to arrive at the similar master data groups.

Likewise, some of the rules used in vendor master de-duplication logic would be as follows:

a.       Same DUNS number

b.       Same bank details

c.       Same tax code

d.       Same address details like name, street number, PO box, Postal code, etc.,

Master data de-duplication table

Master data de-duplication table is the result of initial data cleansing activity and the application of de-duplication logic on the pre-cleansed source data. De-duplication tool has the capability in order to identify group of master records within single ERP system or across multiple ERP systems based on the de-duplication logic.

Simple example of master data de-duplication table would appear as follows:

Table 1 Master Data De-Duplication Table

Group number

Source System Name

Material

Material Description

Manufacturer

Vendor part number

1

SAP System 1

Material A

Bolt Hydraulic

Mnfr 1

9N-4524

1

SAP System 1

Material B

Bolt

Mnfr 1

9N4524

1

Non-SAP system 2

Material C

Bolt, long Oil hydr

Mnfr 1

9N-45-24

In the above example the de-duplication logic has worked on the source system data and grouped these three materials which are of similar nature.

Nomination of leading master data

Once the grouping of similar master data has been done, there is need to select the material which should get migrated into the target system.

Table 2 Master Data De-Duplication Table Appended with Nomination Columns


Group number

Source System Name

Material

Material Description

Manufacturer

Vendor part number

Leading Material

Non-leading Material

1

SAP System 1

Material A

Bolt Hydraulic

Mnfr 1

9N-4524

Yes

1

SAP System 1

Material B

Bolt

Mnfr 1

9N4524

Yes

1

Non-SAP system 2

Material C

Bolt, long Oil hydr

Mnfr 1

9N-45-24

Yes

In the above example if ‘Material A’ is identified as the leading material which should be migrated into the target system and the other two, ‘Material B‘ and ‘Material C‘ are identified as non-leading materials and they are duplicates. The non-leading materials which are the duplicates will not be migrated to the target system. Leading material is also referred as parent material and non-leading material is also referred as child material.

In the above example, refer to the column ‘Vendor part number’. The same part number provided by the same manufacturer was created in two different systems in three different ways. Hence text search logic like normalization of the text (removing the special characters to determine the actual text) should be implemented to determine the duplicates.

The selection of leading/non-leading material is a manual activity which should be guided by few principles as follows:

If a group contains master data from two different systems, then there is conflict of which system specific master data is given first preference to be selected as leading material. Here normally the thumb rule is to have the oldest system master data which has the updated information to get the first preference. The other approach would be to select the master data which has most transactional data. This issue becomes complex (during the selection of the leading material) when the different systems are owned by different internal organizations. Normally the de-duplication process should be carried out centrally with central co-coordinator to mitigate conflicts arising out of selecting the leading material in the groups.

New master data creation in the target system

During this process step we should have arrived with all the manual nominations of identifying the leading and non-leading master data. This would enable us to segregate the leading material which will get migrated to the target system. The leading material which would get migrated to the target system would have new material number in the target system as per the target system material number nomenclature.

Mapping of non-leading master data to leading master data

As a final process step within the de-duplication process, after receiving the new material number in the target system, we should arrive at the mapping of ‘leading and non-leading master data number (old source system material number)’ to the ‘leading master data number (new material number in the target system)’ which would appear as per above example in section 3.3 as follows:

Table 3 Master Data Mapping Table


Old source system material number

New material number in the target system

Material A

New Material A

Material B

New Material A

Material C

New Material A

The ‘Old source system material number‘ contains both the leading (parent) and the non-leading (child) master data number.

‘New material number in the target system‘ contains the material number which is created in the target system.

The non-leading (child) material and vendor inherits the leading (parent) material and vendor master data. Certain data like bank details of child vendor will be consolidated to the parent vendor. Child vendor will inherit parent vendor general data.

All non-leading (child) vendor’s company code/purchasing org/plant will be extended to the parent vendor in the target system.

Impact on the transactional data migration

The non-leading master data will not be migrated to the target system. The transactional data of the non-leading master data would be created using the equivalent mapped leading master data.

Project team structure for the de-duplication projects

De-duplication project involves lot of coordination between different owners of the source systems. Usually in projects staggered across geographies, there would be separate team responsible for each company code relevant master data. In all such scenarios, there should be de-dup coordinator in each location who should liaise with other de-dup coordinators and the central de-dup coordinator. The master data organisation should provide the high level governance. It is beneficial to position central de-duplication co-ordinator centrally across geographies.

Table 4 Project Team Structure


Roles

Major responsibilities

Central de-duplication co-ordinator

  1. Provide technical guidance for doing the leading/non-leading master data nominations
  2. Co-ordinate leading/non-leading master data nominations
  3. Issue and scope management
  4. Leadership activities
  5. Arrange recurring meetings to track progress
  6. Solve conflicts
De-duplication co-ordinator in every company code/ system/geography
  1. Perform the leading/non-leading master data nominations
  2. Participate in recurring meetings
  3. Ensure data quality in the source system
Master data organisation
  1. Governance on master data
  2. Provide clarification on the master data design
  3. Review data quality of source system and implement required structural changes

Pitfalls and the mitigation plan in de-duplication projects

The common pitfalls and the mitigation plans in de-duplication projects are as follows:

Table 5 Pitfalls and Mitigation Plans


Pitfalls

Mitigation plan

During initial review stages, there is likely underestimation of resources needed to review the items and perform the nomination for leading / non-leading master data

As a general guideline, the resource estimate should be based on 100 master data a week per person. This is based on author’s experience in de-duplication project and this is with complete analysis including investigation of purchasing history

If there are multiple system owners, time taken to reach consensus on nominating the leading master data was huge

The role of central coordinator and the authority should be more so that conflicts could be settled easily

Incorrect nominations leads to complexities, when the group contains master data across systems

Resources involved in de-duplication project should have detailed knowledge on master data and de-duplication process

Incomplete nominations leads to complexities, , when the group contains master data across systems

Tracking mechanism to determine which master data nomination is pending with which team

High risk (or) high value items

High risk and (or)  high value items should be approached with caution

Conclusion

This whitepaper discusses the de-duplication process during the initial stages of rollout/implementation projects. However once the parent master data is identified and the new master data are created after eliminating the child duplicates, it is imperative to have defined approach to avoid duplicates further in the target system. There could be single source for master data creation and changes along with effective rules & processes to prevent duplicates at source.

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

Leave a Reply