Using MDG Rule Mining to Improve Data Quality
Define data quality rules can be a time-consuming task. It requires multiple mails, meetings, and phone calls between your business teams and master data teams.
Since SAP Master Data Governance on S/4HANA 1909 and on S/4HANA Cloud 2008, Master Data Rule Mining supports business users and master data experts in analyzing their master data for new data quality rules. Machine learning technology finds patterns in the master data and proposes new rules based on these patterns. You can review and accept these rules which you can then automatically integrate into your master data process.
The rule mining solution helps you to significantly reduce the cost to set up rules, because the proposed rules are based on the “as-it-is state” and existing facts within your master data. The integration with the rule repository and the assistance during the transfer to an active rule diminishes the risk of setting up incorrect rules, reduces the time to implementation, and requires less technical expertise. Also, the data correction effort required by implementing these rules is measurable, because the data evaluation of the proposed rules is visible before you accept them as master data quality rules.
This blog explains when and where you can use the rule mining tool.
When and where to use MDG Rule Mining
To better understand when and where the rule mining tool can be used, let’s look at a couple of typical customer stories.
Story 1: We have a Business Problem
Ralph is the MRP Controller for ASAP Inc. – He is sick of constantly recurring issues in material availability. These issues are caused by MRP parameters that are not maintained as they should be. Up to now, only “paper-based” rules exist. They are outdated, extended with post-its, and not monitored or enforced by the planning system.
Ralph calls Lisa, the master data steward and invites her a meeting, to see how they can work together to improve the situation in the system.
In the meeting, Lisa suggests setting up business rules in the system to make sure the data is correctly maintained. Before defining any rule, Lisa suggests using Master Data Rule Mining to find new proposed rules to serve as a basis for their new data quality initiative. They open the Master Data Rule Mining tool together…
Story 2: Data Quality Workshop
Following a reorganization, ASAP Inc. has a new Enterprise Data Management organization focused on assisting the company in its digital transformation. The Enterprise Data Management organization knows that in order to make digital transformation successful, data quality is a top priority, and decides to invest in their master data.
The managers organize an event to bring different experts from across the company (line of business, IT, data experts) together to define steps for improving their master data quality. Because ASAP Inc don’t have a process or rules to maintain their product master data and also lack the ability to monitor their data health, they choose product master data as the focus for their workshop.
How to use MDG Rule Mining
MDG rule mining can play a big role in the above scenarios. This chapter describes the rule mining concepts and processes.
A complete rule mining process consists of 4 steps:
- Create a Mining Run
- Start a Mining Run
- Find and Accept Mined Rules
- Implement Accepted Rules (automated) in the Rule Repository
To better learn how to use the tool let’s look at the scenario described in Story 1 above. In this case we’re working on how to define an MRP group for finished goods. The rules may involve some basic product data such as Material Group, Base Unit of Measure, and MRP Type.
Step 1:Create Mining Run
You trigger the rule mining process by executing a mining run. A mining run tells the system the data you want to focus on when proposing new data quality rules. To create a mining run, open the Manage Rule Mining Run for Products app and choose the + button.
Here is explanation of the fields you will see on the user interface: :
Description: A sentence or key words outlining what you want this mining run to do. This can be useful when identifying a mining run later.
Goal: A detailed explanation of this mining run’s purpose, or the rules you expect from the data.
Tables: A list of tables to be mined at the same time. Under each table, you define the focus area and fields you want to use for the mining run.
Focus Area: The data set you want to use for mining. For example, in this case we want to perform product master rule mining and choose Product Type = Finished Goods (FERT). These areas are carried to the mined rules later.
Fields: A further drilldown of the selected focus areas on field level. The system examines the values of the selected fields to find potential rules.
Checked by Rule: Check this flag if you want this field to be checked as part of a rule. Mined rules are formatted as IF/THEN statements, selecting this flag means that this field will be in the THEN part of the rule.
Condition of rule: Check this flag if you want the field to be a condition of a rule. Mined rules are formatted as IF/THEN statements, selecting this flag means that this field will be in the IF part of the rule.
Maximum number of rules: The maximum number of rules you want to get from this mining run, this defaults to 100 rules.
Example: Defining a mining run to discover rules for the MRP Group field
- In Tables section choose +. Select the tables Basic Data (MARA) and Plant Data (MARC) from the popup.
- The details page for the table Basic Data (MARA) displays on the right panel of your screen.
- In the Focus Areas section, under Filters choose + and select Material Type (MTART) on the popup and use the value help to select value FERT after the popup closes.
- In the Fields section choose +, select Material Group and Base Unit of Measure. After the popup closes, the flag Conditions of Rule is checked by default.
- Go back to the Tables section and choose Plant Data (MARC). The details page for the table Plant Data (MARC) displays on the right panel of your screen
- In the Focus Area section, there is a default entry, Plant (WERKS) generated already, use the value help to select a plant, for example, 0001.
- In the Fields section choose +, select the MRP Type. The flag Conditions of Rule is checked by default
- In the Fields section choose +, select the MRP Group, deselect Conditions of Rule, and select Checked by Rule.
- Review your data and Save.
Step 2: Start Mining Run
Once the mining run is saved successfully, you should get a numeric mining run ID, which is visible in the mining run list.
Choose the Start button on the bottom of your saved mining run. You will get a confirmation popup which tells you how many records are selected by your mining run definition, and the mining is triggered once you confirm it on the popup.
The mining run begins running. You can stop the mining run while it is still running if you wish and make changes to your settings.
You can choose Refresh to check if the mining run is finished. This updates your mining run status and delivers the numbers of total proposed rules on the mining run header.
Step 3: Find and Accept Mined Rules
To open the mined rules click on the Total Rules value in mining run header. You then get a list of proposed data quality rules from the system. It is important to note here that these are only proposed rules. You need to review them (maybe together with other experts in your organization), accept them if they make logical sense in the business process. In the end you need to create data quality rules out of them if you want to use them in your master data process. Here is a summary of the fields and their meanings:
ID: Identifier of each mined rule, numeric, and generated when mining run is completed.
Description: Readable text explaining the rule. It is displayed as pattern: IF … AND … THEN …
Example: IF Base Unit of Measure = Days AND Material Type = Service
THEN MRP Type = Time-phased Planning
Technical Description: This explains what this mined rule is in technical terms. It is displayed as pattern: IF … AND … THEN …
Example: IF MARA- MEINS = DAY AND MARA-MTART = DIEN THEN MARC-DISMM = R1
Focus Area: The data sets where this mined rule applies, inherited from on the focus area you chose when creating the mining run.
Technical Focus Area: The data sets where this mined rule applies for in technical terms
Data EVALUATION: The evaluation result of the mined rule on the data of selected focus area
Complies with Rule / Complies with Rule (%): Numbers/percentage of records from the mining run’s focus area that obey this rule
Violates Rule / Violates Rule (%): Numbers/percentage of records from the mining run’s focus area that violate this rule
Not Relevant / Not Relevant (%): Numbers/percentage of records from the mining run’s focus area that are not relevant to this rule, meaning they don’t meet the rule conditions.
Checked field: The field name which the mined rule checks, and it is passed into the final data quality rule later when it is linked.
Status: An indicator of the decision made regarding this proposed rule. Newly proposed rules have a status of Initial, and you can change them to Approved, Rejected, or In Review.
Linked data quality rule: The linked Data Quality Rule, which you create based on a proposed rule or linked manually to an existing data quality rule.
When you find meaningful rules, accept the rules first, and then you can choose the Link dropdown button to link the mined rule to either an existing data quality rule or a new data quality rule. The linked rule is shown in the Linked Data Quality Rule column.
You can also put several mined rules together to create one data quality rule.
Step 4: Implement Accepted Rules in the Rule Repository (automated)
Once you have created new data quality rules, go to the data quality rule by clicking on the linked data quality rule. You see the selected proposed rules are listed there with status Not implemented. In the related image we have three proposed rules for one data quality rule.
Data quality rule implementation is done in Business Rule Framework (BRF+). When you create a new data quality rule manually, you need to create all BRF+ objects, expressions and rules in the BRF+ workbench. But with rule mining approach, all the BRF+ implementation is done automatically by choosing the Prepare button on the Usage screen.
After choosing Prepare, the status of all proposed rules changes to implemented, and the Scope and Conditions decision table is created in the BRFplus. You can choose the link to open the generated BRF+ implementation.
Before you can use the rule in the data quality evaluation process or other master data processes, you must approve the data quality rule and enable the usage.
Rule Mining finds patterns in master data by looking at the available combinations of attributes and values. The system outputs combinations of attributes and values that fit certain criteria as proposed rules. Based on business know-how and data evaluation of these combinations, end users decide if these proposed rules qualify as real business rules. Afterwards, the accepted rules can be automatically implemented as data quality rules, which can be used in your master data process.
Next for You