Recently at SAP TechEd, Las Vegas, I had a session on the “Top 5 Things to Get Started with Predictive Analytics.” As you know, predictive analytics is the continuum of asking “What happened?” to “What will happen?” and “What is the best that could happen?” The quest to find out what will happen and what best could happen requires five basic principles that each and every data science needs to start with.
- Define the business problem (Use Cases)
- Harvest the right data for the use case
- Access skills in your team
- Pick the right tool
- Accelerate the productivity and fail fast
1)Define the Use Case
Everything has to start with the use case. The definition of a predictive analytics use case is fundamentally two categories in a broader context:1) you know what problem/opportunity that you want to address and 2) you do not know what problem opportunity you want to address.
For the one you know (supervised learning), you have to define the target variable, and the target variable dictates what all related data you need to prepare. More importantly, target variables carry all the information that you need to predict.
In the case of second category when you don’t know what to do (unsupervised learning), the most common thing is doing segmentation (Clustering), for example: distinguish between human and monkey.
Either way, as a first step you have to define the use case in the business context, like Churn Reduction, Customer Segmentation, Next Best Offer for call center interactions, and so on.
2)Harvest the Right Data for the Use Case
Identifying, locating, and collecting relevant data for the use case is the second important thing. In the process of identifying and locating data sources for the use case, you will learn many things, such as enterprise barriers around this data, respective department ownership around this data, access conditions, and more. These lessons are repeatable in many cases and accelerate your process for subsequent use cases.
3)Access Skills in Your Team
Many research reports suggest that there is a huge shortage in analytical skills. Data science is a team sport— you need many personas including a data scientist, business analyst, data analyst, application developers, and story teller. Accessing skills available in your team is another critical step. Understanding the strengths and the gaps in skills will lead to what type of tools are required for the team to perform the job.
4)Pick the Right Tool
When it comes to picking the right tools for predictive analytics, you need to look into:
- Tools capabilities in bridging the gap in your team skills
- Collaboration capabilities to smooth the process
- Automation capabilities to increase the productivity
- Big data support such as native modeling and wide dataset support
- More importantly, a governance framework that will govern your team process, predictive model, retraining, and scoring.
5)Accelerate the Productivity and Fail Fast
There are two phases when it comes to data science—lab and production. In the lab, you explore, innovate, test your hypothesis, prototype, and gain confidence on your predictive models. In production, you deploy, manage, govern, and embed into application and business processes.
You cannot skip the first phase, and the best way to move into the next phase involves moving faster through the cycle of data preparation, exploration, model building, validation and testing, and (more importantly), fail fast. So that you can iterate quickly and gain confidence on your predictive model.
To summarize, Use Case—Data —Team Skills—Tool for the Job—Fail Fast.
For more information, read the rest of our Predictive Thursdays series blogs.