How To Get Started With Predictive Analytics
Predictive Analytics has recently seen a spike of excitement among many different business departments such as e.g. marketing or human resources who seek to better understand their customers or would like to look at how employees behave in their organization and improve the services offered to their clients. Unfortunately only very few business departments have access to Data Scientists and therefore often have only little experience in developing predictive models. This presents a real challenge since predictive analytics is fundamentally different from traditional reporting and without Data Science support you might find it hard to get started and feel confident in the results of your analyses. Luckily, SAP InfiniteInsight addresses this challenge directly and can be easily used by analysts since greatly reduces the complexity of data preparation and model estimation through a very high level of automation. This way you can focus on the business questions that matter and spend less time dealing with complicated IT solutions. This blog is geared towards analysts who want to understand how to get the most out of their data using SAP InfiniteInsight so here’s how you would get started with your predictive modeling initiative:
Step 0: Understand the predictive analytics process
Before actually getting started, you should familiarize yourself with the general idea behind predictive analytics and how it differs from traditional business intelligence (the folks over at Data Science Central have a nice summary). In short, when using predictive analytics we want to forecast the probability of a future event based on patterns that we find in historical data for said event: For example, to predict turnover (your target) we will need historical data on turnover along with a bunch of attributes that we can use to find relationships and patterns between the attribute and target variables. Once we have derived the historical relationship and built a valid model, we will use this model on new data to forecast turnover. The forecasted results can then be used to make various business decisions. Now, the actual flow may involve a few side steps (e.g. transforming your data so that it can be used) but in essence this is the high-level process that will be described here.
Step 1: Define your business objective
Whether it’s wanting to predict which customer will buy your newly launched product or which employee might leave your company – you need to define what your business objective is and clarify how you want to measure it. This sounds trivial but can provide a real challenge since you need to have historical data available for your target outcome that is sufficiently accurate to derive a statistical model in a later step not to speak of having your target variable available in the first place.
While it’s certainly possible to “just play around” and see what happens (sometimes referred to as exploratory analysis), you will gain better results if you focus your efforts on a single business question from the very beginning. You will also find it easier to gain end-user acceptance if you know what challenge your users are facing and how your analysis can help them solve it.
Step 2: Find & connect to the data
Depending on your business objective, you will now need to find the data to base your model on. You don’t need to have a sophisticated concept in mind but you’ll need a general idea what kind of data you are looking for – with SAP InfiniteInsight there is one simple rule: The more variables you have, the better since SAP InfiniteInsight will determine automatically which variables should be removed and which variables add value to the model. Getting the data from an operational system like SuccessFactors Employee Central or SAP CRM can be slightly more difficult than from a Business Warehouse but the granularity of data available in a BW may not be sufficient for modeling: With operational systems the data usually has the right granularity but is frequently distributed across many different tables and often companies restrict direct table access to users from IT. Therefore you may face some challenges when trying to get the data from the tables directly. BW on the other often has a wealth of data, nicely packaged and preprocessed but you may run into the issue that while the data may have all the attributes that you’re looking for, the data may be too aggregated to be used.
The rule of thumb for data granularity is: You need historical data in the same granularity as the concept you want to predict, i.e. if you want to forecast turnover on employee level you need to have the historical data on employee level as well. The good news is that you can always fall back on using a simple flat file with your data in SAP InfiniteInsight so if push comes to shove you can simply ask your IT department to download some data as CSV in the needed format.
Step 3: Derive & interpret the model
Once you have the data, you want to find the best model that has the best tradeoff between describing your training data and predicting new, unknown data: SAP InfiniteInsight can automatically test hundreds of different models at the same time and choose the one that works best for your data and purpose. Hidden in the background, SAP InfiniteInsight also performs many tasks automatically that Data Scientists usually do with traditional tools to improve the quality of your data and the model performance such as missing value handling, binning, data re-encoding, model cross-validation, etc. This way you can simply point SAP InfiniteInsight to your bucket of data, define which variable to predict and ask the tool to work its magic. All you need to do then is interpret the results (see this blog post to see how you can interpret a model based on a example from HR ).
Step 4: Apply the model
Great – now you have a working model! Next you want to predict new stuff with your model – usually this “stuff” sits somewhere in a database. SAP InfiniteInsight can either directly apply the model to new data (e.g. data that sits somewhere in a table or a flat file) or it can export the model to a database to allow real-time scoring. The first option is more for ad-hoc scoring or further model validation purposes while the second option can be used to continuously score new data as it comes into the database – this way one could include the scored results in some other application or make the information available to other users. However, in the case of in-database scoring you will probably need some involvement from your IT department.
Step 5: Execute on your insights
One of the most important questions of any statistical analysis is: What do you do with the results? How can you reap the benefits of “knowing the
future”? Having an idea about what is likely to happen is not enough – now your organizations need to adapt its behavior to either avoid the unpleasant
outcomes or gain the positive ones as predicted by the analysis. How this can be done depends heavily on your organization and the analysis context –
possible next steps include
- making the results/model available to a larger audience (e.g. HR Business Partners, marketing managers, etc.) by exporting it to a database to enable real-time application of the model,
- including the scoring algorithm in a business application (e.g. an SAP system like SAP CRM),
- developing a one-time action plan based on the results, or
- designing a larger process to use the analysis results in each cycle of the business process to which it belongs.
Remember to include those employees who are crucial for a successful execution (e.g. usually your business end-users) early in the process and make sure
they understand the results and how to leverage the insights. To be accepted, your analysis must be concise, clear, and trustworthy. Try to understand where
your stakeholders (e.g. managers, business users, etc.) are coming from and how to communicate the results of the analysis effectively in their business
language. A great analysis with great predictive power is only half the battle – whether your business will be able to profit from this will depend on your organization’s ability to close the loop to its operations.
At this point you may feel slightly overwhelmed at the sight of the different aspects that play a role when setting up a predictive analytics initiative. It is true – these things can get really complex but when using SAP InfiniteInsight they become much simpler compared to traditional tools due to the high level of automation. However, to get started quickly and get a feeling for the technology you don’t need to boil the ocean – you can easily take data that is already available to you and see what kind of relationships you can uncover (a trial for SAP InfiniteInsight is available here). You can use this blog post to see an example of how SAP InfiniteInsight can be used with HR data but the example and the steps described translate well to other business areas as well. Please feel free to leave any questions or comments!