Skip to Content

MDM Data Modeling – Common mistakes & How to avoid it?

There are some well known mistakes that are often committed by data architects while designing their MDM data model. This blog tries to highlight the top 11 among them and presents the details of the impact that each of the fault can result in. By understanding these mistakes, one can effectively avoid typical problems before they arise. All listed caveats are valid for SAP NW MDM 7.1 and are derived out of real life observations of MDM implementations.

The impact details for some of the mistakes that are highlighted under section 5.1 of the guide and the How to Avoid Problems with your Data Model in SAP NetWeaver MDM – Do’s and Don’ts have also been covered in this blog.

Common mistakes at Repository level

The following points are related to a MDM repository that contains the overall data model and the involved data types (e.g. transactional / operational data), languages used.

1: Storing target system specific master data attributes (special characteristics) in a MDM main table

Often, in order to support the execution of master data creation through a guided process, a local MDM store is defined. In such a case, the data model construct would be designed to temporally keep all local attributes of the master data owned by a specific target system to which MDM had to syndicate the data.

This approach does not follow SAP best practice for the central master data management in SAP MDM. This design implies quite intensive data maintenance – number of syndications, imports, data updates, data deletion, workflow, records check-in and check-out, matching and searches operations which form a significant workload on the MDM server.

Caveat: It is a known fact that by design MDM server places an exclusive repository lock each time it needs to process a data access operation which supposes to change any data in the repository. In case an exclusive data lock is set, a complete repository is blocked and all other requests to this repository will be on hold until the operation initiated the lock is complete and the lock is released. Assuming a case where there are significant number of concurrent processes that tries to access the data from such a main table, there would be high risk that these processes will block each other preventing required smooth business operations. This is especially critical when a background process is blocking a dialog process as a user will be waiting for an online feedback from the system and may get a timeout instead.

What to do: First of all proper performance tests are required on such  models. If the results of performance tests do not meet the business requirements in respect to the performance of corresponding processes, the solution and the data model need to be re-designed.

2: Main table containing no primary master data

Typically, a main table should not be established to support any external processes, a BPM workflow process for example. The object contained in the data model may be directly related to the processing of the master data, but by nature it does not contain the primary master data. This main table as well as all the referenced objects (Tuples and lookup tables) related to it would obviously be updated during data maintenance processes in the repository and therefore will create additional set of concurrent accesses to the repository. In fact, this is similar to the one above, but in this case MDM is used to keep additional information required for some external process to work and this is not recommended by SAP.

Caveat: The impact is the same as described in the issue above.

What to do: Consider to keep this information either in a dedicated repository or out of the MDM server.


3: Significantly bigger data model

In general, significant size and the complexity of the data model can be mostly justified by a combination of two factors, main business requirements of a given MDM implementation and handling of special characteristics of master data attributes in MDM.

For example, if we take an SAP ERP system, main attributes imply all common mandatory attributes which define an object independently from their usage in the enterprise’ subsidiaries, organizations, and locations; while special attributes will imply all other specific characteristics of master data. The main idea behind this division is that MDM data keep only those attributes and characteristics which are common for more than one receiving system to which MDM data are distributed. All other (special) attributes and characteristics are maintained locally is each receiving system following their dedicated requirements to the data content, validation, and business needs.

Caveat: The approach of keeping only global data attributes in MDM repository is generally recommended by SAP. It allows keeping the MDM data model compact, performing well, simple in maintenance, and cost effective in the implementation because the data structure does not need to be too complex and it is relatively easy to ensure a smooth and reliable integration with other systems (data maintenance and distribution).

If this approach is not followed, the corresponding risks take place and they should be mitigated on a project basis. In general, the larger the data model is and the more data it has, the slower MDM Server will be in regard to certain operations on the repository level. Examples of such operations are: loading of the repository, update of the repository indexes, and archiving.

In addition to that, there is the following technical issue behind the situation when both the repository schema and total amount of lookup values have significant volume. When an MDM client needs to send a request to MDM Server (MDS) a communication between this client and MDS needs to be established. Depending on the client architecture / realization, an initialization of this communication may imply the following main technical activities:

  1. getting a connection session,
  2. getting information about the schema, and
  3. getting indexes of lookup tables.

The actions 2 and 3 can be quite time consuming depending on the size of the schema and the volume of the lookup entries. Therefore, the size of the data model may negatively influence the above mentioned activities.

What to do: Avoid storing target system specific special attributes in MDM.


Common mistakes at Table level

The following points are related to the tables in a given MDM repository and tables’ definition (types and structure), identification of redundant or unreferenced tables, tables’ property setting (e.g. key mapping, display fields, unique fields).

4: Hiddel Tables

Typically, unused tables and the tables that were inherited from a past MDM implementation are marked as Hidden because they are not used anymore.

Caveat: The same as point 3 above.

What to do: Remove all tables which are not used from the repository.

5: Non-referenced tables

In many instances. many not referenced tables (a table not referencing any other table in the repository) could be found in a repository. Not referenced tables are stand-alone tables in terms of integrated structure of all repository objects.

Furthermore, there are cases where there would be no records at all in such tables. In a finalized data model this usually means that either these tables are obsolete and need to be removed, or they contain additional “technical” information and therefore cannot be directly addressed in the business data model, or they are involved in other processes which are not directly related to the main objectives of the existing data model. Of course, there might be other reasons.

Caveat: The overall performance of the MDM system may be negatively influenced by additional data maintenance which is unnecessary for the main data flow processing related to a particular repository. CPU and Memory consumption is higher than expected because of additional load from not directly referenced data.

What to do: Review the tables and make a decision for each table. If a table is not required, delete it. If it is required and belongs to the same business object, consider to better integrate it in the existing model. In general, stand-alone tables are possible, but not recommended because of the reasons give above.

6: Not all tables having unique fields

Often it has been observed that many repository tables would not have any fields defined as Unique Fields (UF). Tables with defined UFs of their combinations (combined UF key) get an advantage of data uniqueness enforced by uniqueness of the corresponding field values or field values combinations. Data which do not fit the uniqueness rules are denied at the stage of their creation. It is clear that the definition of Unique Fields or their unique combinations is driven by the business requirements and types of handled data.

Caveat: Data in tables without defined UFs do not get value added by this feature

What to do: Review all the listed tables to ensure that at least one UF is defined per table where it makes sense.

7: Key Mapping activation for many tables

Enabling Key Mapping is yet another common mistake that data modelers do. Often the key mapping would be enabled for not just the main tables but also for the lookup tables and hierarchies.

Caveat: The usage of Key Mapping (with maintained key values) will increase the loading time of the repository, and may negatively influence the performance of update operations as well as the data syndication. The Key Mapping can be considered as an additional text field of objects which key values are maintained for.

What to do: Below you find some use cases which make the Key Mapping useful for data maintenance:

Existing key mappings are always preserved, even for destination records set to be replaced with the Replace action, as follows:

  1. if a source field is mapped to the [Remote Key] field, MDM will reject the source record rather than perform the replace; and
  2. if a source field is not mapped to the [Remote Key] field, MDM will perform the replace after first transferring the [Remote Key] values of the target replaced records to the new record.

In each case, the Replace action will not result in the loss of [Remote Key] values.

Each key in the source data is represented by a value in the list of source values. When a Source Field is mapped to a Destination Field that has key mapping enabled, the MDM Import Manager loads value mapping from the table of remote keys rather than from the map, automatically mapping each Source Value to a Destination Value if the source/destination value pair exists in the key mapping (similar to the way value mappings are restored when a map file is loaded).

[Remote Key] can be used as a record matching field. Each source record will match one destination record if a key mapping exists for the source remote key, and each remote key is associated with a single destination record during the matching step. If no matching record is found and the Create option is selected, the remote key will be associated with the newly created record. All details regarding the Key Mapping functionality can be found in the MDM Reference Guides.

Also verify the particular need of the Key Mapping feature for each repository object and disable it where it is not required. The Key Mapping can be activated any time at later stages of the solution development.


Common mistakes at Field level

The following pointws are related to tables’ fields and at fields’ property setting usage (e.g. keyword search, sort indexing, search tab), identification of unnecessary fields (use of fill rate analysis) and verification of field types usage.

8: Change Tracking activation for many fields

For some reason or the other, in most of the MDM implementations, to quickly enable auditablity of the MDM repository, the Change Tracking functionality is enabled for a significant amount of fields. While there would be an obvious business requirement to have it in place, but because of the possible negative impact on the MDM Server performance from the change tracking activation, the number of enabled fields should be kept as small as possible.

Caveat: Enabling a significant number of fields for the Change Tracking will negatively impact the performance of all operations updating data in the repository in case such data imply many change tracking activities.

What to do: In general, it is recommended to enable as small number of fields for the Change Tracking as possible to reduce the negative impact on the performance of all data update operations (import and maintenance).

Unfortunately, there is no out-of-the-box alternative solution which could be applied if a significant number of fields need to be enabled for the Change Tracking. Therefore, following the action plan could be adopted to mitigate the risk related to this issue:

  • Verify if number of fields enabled for the Change Tracking can be reduced; for instance, fields of many not referenced tables are enabled for the change tracking and can be considered as candidates for which this functionality will be deactivated.
  • Test the performance of all essential update operations using representative data volumes; verify if the system performance meets the business requirements;
  • If the performance is not satisfactory, similar tests need to be executed to check what level of performance improvement can be achieved with deactivated Change Tracking functionality; if the results are satisfactory, an alternative project based solution should be developed.

9: Enabling Sort Index for many fields

It is always good to understand if there is a definite requirement of having a possibility to sort values in all these fields in the MDM Data Manager, or if there are any other reasons why this attribute has to be switched on for the fields.

Caveat: If the Sort Index attribute is activated for a field, the system will need to maintain additional dedicated index. This will increase CPU and memory consumption from the MDM server side and it will slow down such operations as repository load and all operations which imply a data update (change) because the
corresponding indexes have to be updated as well. At the same time, there is also a positive influence on the performance of searches executed by fields with sorted indexes – this is a side effect of this attribute.

What to do: Double-check if all the fields require the activation of this attribute and disable it for each field where it is not really required. This attribute is an additional technical definition of a field and it can be enabled or disabled any time. This means if business users will require this function, the attribute can be activated in
the productive environment with only limitation that the repository should be re-loaded with the option of indices update.

Also it is worth noting that all Display fields are already sorted and there is usually no need for these fields to enable Sort Index attribute in addition. You need to estimate the following capabilities of the Sort Index attribute making your decision regarding its activation:

  • the Sort Index field property makes a field sortable in the Records grid of the MDM Data Manager and through the API;
  • it accelerates free-form search for the equals and starts with Text operators, and for the =, <, <=, >, and >= numeric operators;
  • it also greatly improves matching speed on Equals matches in the Data Manager’s Matching mode.

Your decision should finally be based on all pros and cons from the usage of this property.

10: Suspicious field definitions

Sample 1: Flat table with a field of datatype Text (255)

Sample 2: Flat table with a field of datatype Text Large

It is always possible that for all such cases another field type or the same type but with shorter length could be chosen, or the field could be made the multi-valued.

Impact: In order to ensure maximum performance of MDM server and its minimum rate of the memory and CPU utilization, the data model should be as compact as possible. If this approach is not followed all corresponding risks are applied.

Recommendations: Verify if there are real business and technical requirements behind such definition for each field, and in each case select the minimum required size (and the field type).

11: Smaller cardinality value for fields

The meaning of the cardinality term in this context is the percentage of how unique the field is. Cases where there are fields with their cardinality value very small (far below the pre-defined threshold of 50%), the field could be changed from the normal text type to a lookup.

Impact: There are many factors which make the MDM server performing well for a repository it manages. One of them is the size of the repository. Usually, the optimal approach is to define few broadly used values by means of a lookup table instead of repeating such values thousand times in a certain table field.
Furthermore, arranging often repeated values in a lookup table generally improve the performance of related read type operations (searches, mapping, etc.) and provide additional opportunities to ensure a better data quality.

Recommendations: Examine the fields with small cardinality and, convert them from the normal text type to a lookup.

As said earlier, this list is not complete in any way, but at the same time it tries to highlight the most common ones.

You must be Logged on to comment or reply to a post.
  • Hi ,
         Question on point “Storing target system specific master data attributes (special characteristics) in a MDM main table”.

    For example ,there is a field called payment terms and it needs to value  mapped  individually to 3 other remote systems (say A ,B ,C)  as:
    MDM  A  B   C

    10   EE DD  FF

    In this scenario,we can use a lookup table for MDM values and define 3 keys on it for A,B and C.

    Do you think this violates the first point ?or how do we do it.

    • Hi Premjit,

      Yes, if the need is to have the data syndicated across several target systems, then obviously it needs to reside in MDM in the way you have quoted.

      The emphasis made by point 1 is more on a case where data model construct had been designed to temporally keep all local attributes of the master data owned by a specific target system to which MDM had to syndicate the data.

      Best regards,