Using MDM 5.5 for Data Quality Analysis

Former Member · ‎11-14-2006

Introduction

SAP MDM 5.5 is great for master data management but sometimes it can be a good tool to quickly analyze information which is related to master data. For example, sometime ago I got a request to help with data quality analysis using MDM SP4. The customer’s team told me they would like to find out if they sometimes pay different prices for the same item from same supplier. At the core this is a master data problem – if they had a single, unified list of suppliers with master data management processes to govern this list, this would be less likely to happen. But it did happen, and my job was to find out for which products and for which suppliers.

I got an Access database with 2 similar tables that contained a combined 350,000 records of order lines extracted from R/3. The reason I got this in the first place was because the customer wanted to use SP4’s Matching Mode to solve this. Looking at the data set I realized Data Manager’s Matching Mode is not the way to go here, but that I can definitely use MDM to find the information they’re looking for, even though order lines are not master data.
We sat down and decided to break the work to the following steps:
   1.   Data analysis – understanding what we’re looking at, which fields are relevant and which fields we can ignore.
   2.   Repository design and importing – defining how the repository will look like and importing the data into a structure which will help us give answers.
   3.   Finding the answers – find out which items were ordered at different prices.
   4.   Syndication – export the relevant information for processing.

Data Analysis

Our first step was to analyze the data to understand the meaning of the different fields. Knowing that we’ll need to find matching records to figure out if and when products were bought at different prices our main goal here was to find which field holds information we can match on. Usually people point at some kind of key or ID field, like “product id”; in our case we saw a field named “material code”. After talking to the customer we found out that it would be best to use the “description” field and not the material code – turns out that material codes were not assured to be unique for each material. We also decided which fields should be imported according to the information we were asked to produce at the end of the process. Eventually we decided to use the following fields: Description, Unit Price, PO Number, Quantity, Plant, PO Date, Material Code, Supplier Name, Group Number and Group Description.

Repository Design & Import

The second step, repository creation and data import, was the trickiest and most important one. I sat with my colleague Alon Weinstein to design the model and we decided that although the database contains order lines the repository will be based on products, which means our main table will store products.
Our first idea was to import all the records and then create a matching strategy using new SP4 Matching Mode to merge duplicate occurrences of products. The problem is that it would make it hard to find out which products were bought at different prices, which is what we were asked to find out.
So we came up with a different solution - we will import product-related information into the main table and store all the order-related information in a qualified table. The idea was to have at the end a single record in the main table for each type of product bought and have all the order numbers and prices in the qualified lookup table. In the main table we put the fields that are unique for each product in the data set we got, for example Description, Group Number and Group Description. We will then create a qualified table that will contain the fields that might have non-unique data for each product in an order-line -- Unit Price, PO Number, Quantity, Plant, PO Date, Material Code and Supplier Name. All of these fields will be qualifiers because they will be different for each order-line. The fact is that in this case we could do without a non-qualifier, but because a qualified lookup table requires a non-qualifier we created one for the field Currency. In this case it was just a dummy field but it would make sense if we had prices in different currencies and want to match only prices of the same currency.
If we had stopped at this point we would have had a list of all the products with each having a list of the orders done on it. We were still missing something – we still didn’t have a way to easily see all the distinct prices a product was bought at. To have that we added another qualified lookup table which will hold only the distinct prices the customer paid for a product, and we’ll use Import Manager to populate this table properly.
To make sure we end up with a single record for each product we used Description as the matching field in Import Manager (remember that in the data set we got Description was in fact the unique identifier for each product). We configured the import of the qualifiers by setting the qualified update method to “Update” (Figure 1), adding all the qualified fields as Matching Qualifiers and setting “New Links” to “Create” and “Existing Links” to “Update (All Mapped Qualifiers)” (Figure 2). This made sure all the different orders will be imported into the qualified lookup table for each product.

Figure 1

Figure 2

To import only the distinct prices into the second qualified table we used the price as the matching qualifier and told import manager to create new links but skip existing ones. (Figure 3)

Figure 3

At this point we started the import and populated the repository.

Finding the Answers

The next step was identifying which products were bought in more than one price. Thanks to SP4’s new aggregate functions in its expressions editor we could easily search for all the records which have more than one value in the qualified table holding distinct prices (using the COUNT function). Another option would be to use the same expression in a calculated field. Thanks to the design of the repository and the import process and the COUNT function identifying these products became as easy (and as fast) as searching for records in MDM.

Syndication

Syndication was a breeze – just search using the expression (or on the calculated field if we had used one) and syndicate only the results.

Summary

To sum things up it was very nice to see how SAP MDM 5.5 is flexible enough to be used as a tool to quickly and easily solve data-related problems. The whole process took a few hours from the moment we got the data set to the moment we had the syndicated results (including the time we spent contemplating how to design the solution). It turns out the customer was trying to solve this with several other tools and weren’t able to.