Level 1 – Easy ; 30 minute read
Audience: Project managers, business analysts, subject matter experts
Author: Mark Muir, SAP BTS, S/4HANA RIG Americas
Welcome back! In my last blog we covered an introduction to modeling and introduced CRISP-DM (Cross-Industry Standard Process for Data Mining) a widely used cross-industry standard for data mining. Before we get to a project methodology let us first evaluate pre-requisites of the business understanding phase of CRISP-DM.
There is much more than having an epiphany to start your machine learning project, the following pointers are worth considering.
- business transformation requires organizational change management creating structure and conviction between management, business and IT to enable and make your project a success.
- includes training and deals with the people who have to change their ways of working because of a transformation project. It deals with their expectations, their needs, their motivation, their concerns and their resistances
- enables business and IT to get management and employees engaged through appropriate interactions, interventions and coaching, and broad as well as personal communication.
- offers the business and IT appropriate mechanism, tools and techniques to transition from the current to future state in an effective, efficient and sustainable way
- design thinking helps with human problem solving, there is an industry built around solution focused designs for customers. Without considering how a business user will interact and consume the output of a trained model seems futile to the time and effort going into the problem i.e. think user experience.
- Here at SAP we have an organization supporting innovation SAP Leonardo Centers, reach out to understand more.
- identifying possible use cases takes time, prioritize potential opportunities based on the following dimensions:
- Technical feasibility (e.g. data availability, data access for SAP)
- Business potential (e.g. saved effort or generated revenue)
- incorporate project management into your project, standard practices to initiate, plan, execute, monitor, control and close your project will be required. Agile and SCRUM are options but other frameworks and tools exist.
Back to CRISP-DM….
Benefits of CRISP-DM
- Data-centric project methodology is non-proprietary
- Application and industry-neutral, tool-neutral, and focused on business issues as well as technical issues.
- Provides a framework for recording the process
- Allows for iterative processing to come closer to a desired result
- Supports project planning and management
- Ease of adoption across all skillsets
- Objective: Focus on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data science problem definition and a preliminary plan designed to achieve the objectives.
- Objective: Start with an initial data collection and then it proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information.
- Objective: Support all activities to construct the final dataset from the initial raw data. Data preparation tasks are likely to be performed multiple times and not in any prescribed order. Tasks include table, record, and attribute selection, as well as transformation and cleaning of the data for the chosen algorithms.
- Objective: Various modeling techniques are selected and applied, and their parameters are calibrated to the optimal values. Some techniques have specific requirements for the form of data.
- Objective: Evaluate the model and review the model construction to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. Note: At the end of this phase, a decision on the use of these data science results should be reached.
- Objective: Preparation of results are organized and delivered to the business or organization. Depending on the business understanding and requirements the deployment phase can be as simple as report or as complex as implementing a repeatable data mining process across the enterprise.
CRISP-DM: PHASE – Business and Data Understanding
|1.1||Determine Business Objectives||• The first objective of the data analyst is to thoroughly understand, from a business perspective, what the client really wants to accomplish.|
|1.2||Assess Situation||• In the previous task, your objective is to quickly get to the crux of the situation. Here, you want to flesh out the details.|
|1.3||Determine Data Science Goals||
• A business goal states objectives in business terminology.
• A data science goal states project objectives in technical terms.
|1.4||Produce Project Plan||• Describe the intended plan for achieving the data mining goals and thereby achieving the business goals.|
|2.1||Collect Initial Data||
• Acquire the data (or access to the data) listed in the project resources.
• This initial collection includes data loading into the data exploration tool and data integration if multiple data sources are acquired
|2.2||Describe Data||• Examine the “gross” or “surface” properties of the acquired data and report on the results.|
|2.3||Explore Data||• This task tackles the data mining questions, which can be addressed using querying, visualization, and reporting.|
|2.4||Verify Data Quality||
Examine the quality of the data, addressing questions such as:
• Is the data complete?
• Is it correct or does it contain errors?
• Are there missing values in the data?
CRISP-DM: PHASE – Data Preparation and Modeling
Business Understanding and Data Preparation
• Decide on the data to be used for analysis.
• Criteria include relevance to the data mining goals and quality and technical constraints such as limits on data volume or data types.
• Note that data selection covers selection of attributes (columns) as well as selection of records (rows) in a table.
• Raise the data quality to the level required by the selected analysis techniques.
• This may involve selection of clean subsets of the data, the insertion of suitable defaults, or more ambitious techniques such as the estimation of missing data by modeling.
|3.3||Construct Data||• This task includes constructive data preparation operations such as the production of derived attributes, entire new records, or transformed values for existing attributes.|
|3.4||Integrate Data||• These are methods whereby information is combined from multiple tables or records to create new records or values.|
|3.5||Format Data||• Formatting transformations refer to primarily syntactic modifications made to the data that do not change its meaning, but might be required by the modeling tool.|
|4.1||Select Modeling Technique||• Select the actual modeling technique that is to be used.|
|4.2||Generate Test Design||• Before we actually build a model, we need to generate a procedure or mechanism to test the model’s quality and validity.|
|4.3||Build Model||• Run the modeling tool on the prepared dataset to create one or more models.|
|4.4||Assess Model||• Interpret the models according to domain knowledge, the data mining success criteria, and the desired test design.|
CRISP-DM: PHASE – Evaluation and Deployment
• Assess the degree to which the model meets the business objectives.
• Test the model(s) on test applications if time and budget constraints permit.
• Conduct a more thorough review of the data mining engagement to determine if there is any important factor or task that has somehow been overlooked.
• Identify any quality assurance issues.
|5.3||Determine Next Steps||• Assess how to proceed with the project.|
|6.1||Plan Deployment||• Plan deployment|
|6.2||Plan Monitoring & Maintenance||
• Monitoring and maintenance of models and analysis are critical issues.
• A careful preparation of a maintenance strategy is essential.
|6.3||Produce Final Report||• At the end of the project, the project leader and his team write up a final report.|
|6.4||Review Project||• Assess what went right and what went wrong, what was done well and what needs to be improved.|
Thank you for your interest
Further reading in this Machine Learning blog series