Level 1 – Easy ; 20 minute read
Audience: Project managers, business analysts, subject matter experts
Author: Mark Muir l SAP BTS, S/4HANA RIG Americas
Before discussing the concept of modeling its important understand where modeling fits into the cycle of a data mining (DM) and how we measure success when taking on a DM project. Whether creating a custom report in SAP ERP or designing a complex model to solve unique challenges in your business success and acceptance of the end result means different things to different people or organizations.
- The Business seeks a successful or useful outcome to the project from the business point of view.
- e.g. model meets the business goals and objectives, establish trust in the data etc…
- Data Science desires a successful outcome to the project in technical terms.
- e.g. comprehension of the business question, choosing the right algorithm, model selection, accuracy and robustness, visualization, ease of maintenance etc…
We shall cover managing your project in our next blog, at a high-level multiple CRISP-DM phases coexist in a framework to help define the why, what and how.
- Business understanding
- Data understanding
- Data preparation
- Deployment, Monitoring and Continuous Improvement
Unlike SAP ACTIVATE where cloud and on premise methodologies are defined by phase in detail it is still recommended to adopt project management when exploring data mining projects simply because initiating, planning, execution, monitoring and controlling, closing are still relevant. Agile methodologies and SCRUM are options but other frameworks and tools exist.
As we read above there are three sequential phases (business and data understanding, data preparation) required as input to our modeling phase.
- Modeling Basics
- Recommendations for good practice in modelling
- Improving Your Model
- SAP Ready-to-use and Re-trainable Services
- Data Analysis Tools
Selecting a Modeling Technique
Predictive modeling uses statistics to predict outcomes. This requires simulation of the original process with historical data and requirements to meet an agreeable outcome with the business.
In a real-world scenario one or multiple statistical techniques can be used to identify the best algorithm to predict new data points to satisfy the needs of the problem.
For example, algorithms ‘techniques’ used in predictive modeling. Core skills and techniques in open source programming languages:
- Python (general approach to data science)
- R (statistical analysis)
Generate Test Design
Business problems are unique to the customer, the same applies to determining how you test a model’s quality and validity i.e. garbage in, garbage out.
Selecting the right technique(s) influences the output of the test design describing the plan for training, testing, and evaluating the models.
Primary components of a plan is how to decide:
- Select the right training dataset for analysis i.e. quality (target variable and values), volume of data.
- Type of data can vary i.e. source e.g. S/4HANA, structured and unstructured e.g. images, text, video etc …
- Determine if conversion of data applies before applying predictive methods.
- Divide the available training dataset into 1. Estimation data, 2. Validation data, and 3. Test datasets.
- Not lose sight of the business requirement
- Determine which analysis tool you will use to transform data into useful results. Example: Build a Predictive Model using R
- Run the analysis tool e.g. R on the prepared dataset, this will create one or more models.
- The output will include information about the parameter settings, the models themselves, and the model description.
- Document parameter values and adjustments along with rationale for the choice of parameter settings.
- Ready to Use Services: Leverage pre-trained ML services via simple Web APIs allowing immediate usage
- Bring Your Own Model: Deploy, publish and run your own ML Model as a service
- Customize Model: Re-train and tailor image classification services based on your own data
Interpret the model(s) according to area of expertise, defined success criteria and desired test design. Various skills maybe required to perform a successful evaluation.
- For example, data scientist, data analyst, line of business, subject matter experts by application e.g. Finance and Costing etc…
- Results of the model assessment may need a revision of model parameter settings and further ‘tuning’ them for the next run in the build model task. Process is iterative.
Next, rank the models according to the evaluation success criteria, remembering to consider the business objectives and business success criteria.
A model assessment summarizes:
- Result of this task
- Lists the qualities of all of the generated models (for example, in terms of accuracy)
- Ranks their quality in relation to each other.
- Consensus with the business that the best model has been found
- Influences documentation of results, including all revisions and assessments.
The output from a model, i.e. the probability/scores created when you apply the model.
Quality Indicator: Predictive Power
The predictive power of a model is the quality indicator of models generated using the application. e.g. A model with a predictive power of:
- 79 is capable of explaining 79% of the information contained in the target variable using the explanatory variables contained in the dataset analyzed.
- “1″ is a hypothetical perfect model, capable of explaining 100% of the target variable
- “0” is a purely random model.
For more information about SAP Predictive Analytics please read Automated Analytics User Guide and Scenarios (June 2018).
Example: Training Your Model
Recommendations for good practice in modeling
- Understand the purpose of the model this will impact model type, complexity, data and output.
- Justification for a model what are you trying to achieve i.e. alternative source of information, demonstration of a weakness etc…
- Keep the model as simple as possible to aid understanding by decision-makers, at least in pilot and adoption phase of using models.
- The presentation of results should be visually pleasing and transparent to all reviewers e.g. business, analysts involved.
- The quality of all the data used in the model should be clearly stated and in detail i.e. no confusion or doubt.
- The model should be validated against the results of other models and/or the results of intervention studies
Improving Your Model
Obtaining a better model is achieved by:
- Improving the prediction confidence of the model, or
- Improving the predictive power of the model, or
- Improving both the predictive power and he prediction confidence of the model.
SAP Ready-to-use and Re-trainable Services: Roadmap
- Functional Services provides readily consumable pre-trained models that can be used as a web service by calling simple REST APIs.
- Explore the functional services such as image classification, product image classification, topic detection, time series change point detection.
Thank you for your interest
Further reading in this Machine Learning blog series