Custom R Components – Classification with the Naive Bayes Algorithm
The Naive Bayes algorithm is one (of many) methods of Classification. For instance you may want to derive from a past Marketing campaign what prospects you should focus on in your next Marketing activity. The algorithm can identify patterns of what type of contacts have already purchased a certain product (ie what was their age, gender, income, etc.). Now you can use this information for your next campaign and focus on the people that are most likely to be interested. So you spend your Marketing budget where it is most effective.
SAP Predictive Analysis can use the Naive Bayes algorithm thanks to the ability to create Custom R Components. Within such a component an expert user can encapsulate R-Script in an end-user-friendly format. With thousands of different methods available in R, that concept is extremely powerful. This article explains how to implement and use Naive Bayes.
Let’s try the Naive Bayes algorithm on some data from the real world. The UC Irvine Machine Learning Repository kindly hosts a dataset with information taken from the 1994 US Census. The file called Adult contains anonymous information from over 32.000 people listing their age, education, martical status and much more, including the information whether the person was earning over 50.000 US Dollar in the year 1994. We will use this information to create a model that we can apply on future data to determine if the person is likely to earn more or less than these 50.000 USD.
You can follow the steps below if you download the above dataset. Before getting started, you may just have to add a first row with column names.
Just load your data into SAP Predictive Analysis. You see some of the available columns. The ‘Income’ field on the right-hand side tells us whether the person was in that year over or below the 50k threshold. This colum is called ‘TargetVariable’ in the screenshots below.
Now add the Naive Bayes Classifier component to my model. Further below you find the details to add this logic to your own SAP Predictive Analysis installation.
Configure the component. You need to tell the component
– the Classifier Column: Income
– and the Predictor Column: Here you can pick Age, Occupation and HoursPerWeek to start.
Run the model. Then go to the charts area. The table shows how many records were correctly and incorrectly classified. 24.263 people were correctly classified as earning less than 50.000 USD. 556 people were correctly classified as high-earners.
You can also save the trained model to further test it on data that is already classified. Or you can apply the model on new data for which the classification is actually unknown.
Please make sure you have the R-libraries e1071 and gplots installed. The following document explains how to make new libraries available in SAP Predictive Analysis:
You many want to read the documentation of the Naive Bayes algorithm on:
How to Implement
The component can be downloaded as .spar file from GitHub. Then deploy it as described here. You just need to import it through the option “Import/Model Component”, which you will find by clicking on the plus-sign at the bottom of the list of the available algorithms.
Please note that this component is not an official release by SAP and that it is provided as-is without any guarantee or support. Please test the component to ensure it works for your purposes.