In the previous post we talked about the concept of Exploratory Analytics.
As a quick reminder, exploratory analytics is not a product, it is rather an approach to data analysis and a set of functionalities where you let mathematical algorithms and computer automation work on your dataset to surface, automatically, some interesting results (correlations in data, outliers in your dataset, groups of items with similarities, etc.).
You, as a business savvy person, can look at those results, see what their business value is and take strategic decisions based on them.
Exploratory analytics are complementary to both classic analytics and advanced analytics as shown in the picture below:
In the classic analysis approach, you decide, step by step, what to do with your data. You create tables, filters, slice and dice with the goal to surface some knowledge which you expect to see in the dataset. You usually work manually towards a specific goal in mind with a trial and error approach. With this approach you can easily answer questions such as “How many customers are buying my product?”
In advanced analysis you let mathematical algorithms work on the data to build a predictive model. You then use this model to take operational decisions. With advanced analytics you can, for example, build a model which answers (in real time if you want) a question such as “Is this prospect likely to buy my product?”.
Finally in an exploratory analysis, your make use of the same algorithms of advanced analytics to obtain insights which help answering questions such as “Why are customers buying my product? “. Knowing the ‘why’ behind a decision can help you change your business to improve it.
What you’ll learn reading this blog?
In this blog we show how SAP Predictive Analytics, with its Automated Analytics module can provide you the instruments you need to do exploratory analytics.
Practically speaking, we show that after performing a classification, you are automatically presented with various insights which can be used to drive your decisions.
Supposing that you are analyzing a dataset showing customer characteristics and your target is a flag saying if a customer has purchased or not a product, after running the classification you automatically get various insights:
“Key Influencers” are the variables which mostly explain the target (e.g. what customer characteristics are most related to the decision to purchase or not
a product). You can get insight on specific values of key influencers but you also automatically get “groups” or “bands” of values with a similar influence.
The values are automatically “grouped” together when the variable is categorical (e.g. “customers country is France, USA or Italy”; they are automatically “banded” when the values are continuous (e.g “age between 29 and 45”). Groups and bands greatly simplify the analysis and the tool does a great job automatically proposing the best ones without you having to worry about the best way to bin your data.
Finally the tool quickly and automatically points you out “segments” of interest. Those are set of records having similar characteristics which have a strong influence on the target (e.g. the tool can show that customers “living in the USA and aged between 18 and 25” show the highest likelihood to purchase your product).
It is time now to see some action and understand, with an example, how you can improve your business based on insights coming out of an exploratory analytics approach.
The whitest napkins you have ever seen!
Imagine that you are working in a company specialized in cleaning table cloths and napkins for restaurants. In the past few months you created a new offer called “Premium Service” which guarantees restaurants to have the whitest napkins in the whole country! You proposed the service to several of your existing customers, some of them purchased it, some other not.
You created a list of all the restaurants to whom you proposed the service. In this list you put all the characteristics about your customers (e.g. how many seats the restaurants have, if they are located downtown, in the suburbs, in the country, the average price of a meal, if they have a valet, etc.).For each customer you marked if yes or no they purchased the Premium Service.
The dataset might look like this:
You can use this list for two tasks: create a predictive model which can tell if a prospect is likely to accept the service (advanced analytics) and/or see if you can find some interesting patterns in the restaurant profiles which you can use to improve your business (exploratory analytics).
Typically a marketing manager focused on a short term marketing campaign (where the goal is to maximize the return and minimize the cost) would use the predictive model in an operational mode.
A business strategist who wants to improve the business globally on the long term would be more interested in the exploratory approach.
To accomplish both tasks you can use SAP Predictive Analytics and its Automated Analytics module.
The basic question you want to answer is if ‘yes or no’ a customer is likely to buy the service. This is a typical classification problem so you apply the Classification module. You set the Premium Service flag as the target variable. (If you never used SAP Predictive Analytics you can watch this video to see how to use Classification http://scn.sap.com/docs/DOC-62236 )
All other variables (excluding IDs) are going to be analyzed to understand their influence in the purchase decision. The screen where you set the variables looks like the following:
Now you click a few Next buttons and, after the Classification completes its execution, the model has been created automatically for you.
First of all you need to check if the quality of the model is good, to do that look for the Predictive power (also know as “Ki”) and Prediction Confidence (“Kr”) in the model summary.
If the model is good you can now use it in an operational mode to ask “Is this prospect likely to purchase the service?” or you can use it in an exploratory mode to ask “What are the typical profiles of customers who purchase the service”. This second mode helps taking strategic decisions.
You should notice here that you are using information from the past (your list of customers who, you already know, purchased or not your product). The Classification module is able to discover patterns in the past data. The tool can then apply the same patterns on new data (prospects) or help you analyze them to understand what is influencing a purchase decision.
For an operational usage you can immediately go to the Run or to the Save/Export sections. From there you can check in real-time which new prospects are likely to purchase the service. Alternatively, if you are a developer or work with developers, you can export the model in various programming languages (like Java, C, C++, SQL, and many others) so that it can be embedded in an application suggesting to your sales team which restaurants to approach. SAP Predictive Analytics automatically provides you the code in the language you need. You copy it and paste it into your application.
In this blog we are more interested in an exploratory analytics usage, let’s see how to proceed with it.
For strategic decisions you can look at various information generated with the model, see if they make business sense and decide how to use them.
You can start your exploration by looking at the key influencers.
In the Automated Analytics module you open the Contributions by Variables section under Display. You see a visualization similar to the one below:
This graphic tells you that the variables which are most related to the decision to purchase the service are, in order of priority, the Price Segment, the Location and the Number of Covers of a restaurant. Those are the key influencers of the Premium Service target.
If you double click on Price Segment you see the following visualization:
This is telling you that, according to your past data, a very expensive restaurant (80 and more USD for a dinner, on the left of the display and of positive value) is more likely to purchase the service. On the contrary, inexpensive restaurants (19 USD for a dinner, in the right part and negative) are less likely to purchase the service.
While you are on this screen, you can also see that all other price segments were automatically grouped under the label “KXOther”. Those other price ranges are not really meaningful and the tool simplifies the visual analysis for you by grouping them together.
If you now open the second variable, Location, you see something like the following screen:
This screen tells you that Downtown restaurants are more likely to purchase your service while Countryside restaurants won’t probably purchase it. Here again you see that a new group has been created automatically with restaurants in Small Town or in Suburbs. They have the same influence (negligible), no need to make analysis more complex by showing separate entries.
When opening the third variable, Covers, you have the following screen:
We won’t actually use this information for our analysis but you can see that you automatically obtained bands of values which have a similar influence. If the visualization had a bar for each value of “Cover” it would have been almost useless because too difficult to read and too detailed to be effective. With the automated banding you can immediately see that restaurants in the band of 76 to 106 seats are the most likely to purchase the service.
Let’s see how we can use the knowledge we already gained.
First of all you have now identified your most important variables in the list of key influencers. You could decide to simplify your analysis (even a classic analysis) by taking into account only them. In this example it might not seem very useful but if you think of a scenario where you have thousands of sensors, you might be able to identify the few ones which are really important for your analysis and use only their data.
Restaurants which are more likely to purchase the Premium Service are expensive, they are probably luxury restaurants. You could propose to your marketing team to refresh your brand so to make it look high-end. New expensive restaurant prospects might be attracted by this luxury aspect of the brand.
On the other hand you might take a completely different approach: reduce your pricing to be more attractive for inexpensive restaurants.
On the location front, you could decide to focus your business to the downtown areas of large cities. This could reduce your cost of transport while making sure your trucks are faster on site when an important customer calls for something urgent. This decision could even mean that you decide to disregard completely restaurants located in the countryside.
We can go even further in our exploratory analysis.
If you open the Decision Tree section of SAP Predictive Analytics you can look at the combined influences of multiple variables. The screen below shows the root and some leaves of the tree (you can actually choose the leaves you want to display or have SAP Predictive Analytics automatically open the most influencing leaves one after the other). The decision tree helps you identify segments of interest for your analysis.
Looking at the Decision Tree you see that the most likely customer to purchase the service are restaurants which are expensive AND located downtown (20,65% of them purchased your service). You could be tempted to create a specific marketing campaign for that kind of restaurant but if you look at the absolute numbers you see that there are only 431 restaurants of that type over a whole population of more than 8000 restaurants. This segment contains only the 5% of restaurants. This should make you think: is it a good idea to target such a small population of restaurants? Shouldn’t you have two different marketing campaigns, one for expensive restaurants, wherever they are and one for downtown restaurants whatever their prices? You can talk about this with the marketing team and bring the numbers and visualizations with you to support the discussion.
To summarize, you have seen that in a few clicks, using the classification module and looking the model debriefing, you were able to do exploratory analytics to take strategic decisions for your company. Those decisions were taken on real data based on a good model automatically provided by SAP Predictive Analytics. You didn’t have to think about how to manipulate the data, how to filter it, how to visualize it. The tool did all of that for you. You could then concentrate on deciding how the mathematically correct and interesting output could be used to improve your business.
Here we took the example of napkins but you can use the same concepts in many different situations.
Just think of your business and of some things you want to improve, of the data you already have in stock and you are likely to find good examples.
If you have any idea or example, just post it under this blog so that all the community can benefit from it!
I hope this paper inspired you to try out SAP Predictive Analytics and make exploratory analytics with it. If you want, you can download a free trial version here: www.sap.com/trypredictive.
And if you have any feedback or idea on how to improve SAP Predictive Analytics you can post it here:
Have fun and happy explorations!