Skip to Content
Technical Articles
Author's profile photo harshavardhan kadam

CLUSTERING IN SAP ANALYTICS CLOUD

Context

  • Introduction
  • Data Preparation and Data Modeling
  • Application of Cluster Algorithm using R in SAP Analytics Cloud

 

Introduction

I am Harshavardhan Kadam, Data visualization analyst, I am writing this article on CLUSTERING in SAP Analytics Cloud , when I was build a Viz on SAP Analytics Cloud I wanted to create a cluster for customer based on RFM model, Unlike other Viz tool SAP Analytics Cloud gives us an upper hand in building cluster using R. This article I have written, How to create a cluster using K-Means(Details on KMeans explained in Cluster Section)

To Implement the complete Customer Clustering, We have used data from E-Commerce portal, The data consist of Customer ID, Invoice No, Country, Item No, Item Description, Invoice Date, Quantity ordered, Unit price and order status.

We will be pre-processing the data, then feed the data into the model and then build a cluster and RFM model on preprocessed data. To build a cluster model we must have a basic R programming knowledge.

 

Data Preparation and Data Modeling

 

Data Preparation

The data need to be analyzed before feeding into the model. Below are few of the   steps to be considered

  1. Check for Null values/Missing values.
  2. Check for Date format.
  3. Make sure our data do not have ZERO or negative value for ordered quantity and Unit price (Note: For any other domain we need to be sure that measure related to price must not be Zero or Negative).
  4. Check for any uncommon name for format in our data

Example: For few Item in our data had a description as Post,Bank Charges,Manual etc.which is not considered.

Note: These  cleaning and preprocessing activity  need to be performed at backend before feeding data to SAP Analytics Cloud model .Although SAP Analytics Cloud can perform these cleaning very efficiently but this might slow your  data load in model as unnecessary data will be loaded that we might need to be  delete at later stage.

Note: Data can be preprocessed using any ETL technique or using R

 

Data Modeling

When we have performed the preprocessing activity the data can be loaded to the Analytical model in SAP Analytics Cloud

  • Login to SAP Analytics Cloud and create a model.

     

  • Select the Option to import the data. SAP Analytics Cloud provided two option
  1. Import using file.
  2. Connect to Data Source.
  • Once Data is loaded perform data modeling
  1. Check if Dimension and measure are projected Correctly.

Example: Customer ID sometimes shown as measure, convert to Dimension.

  1. Check date format Convert all the date in one single format.
  2. Extract a day of the Week, Month, and Year from date.

Example to Extract Year ,like wise we can create Month and Day

  1. Created a required calculated column using Calculated Funcation (fx)

Example: Line Total= Quantity*Unit price

 

  1. Create a Geo location column on Country column.

  1. Once all the modeling is done, validate the data and build model.

 

Now when model is ready, we must start building our Story.

Clustering in SAP Analytics Cloud using R

 

Clustering Definition: The technique grouping of homogeneous objects which are similarity between them and dissimilarity to others group is called clustering. This is the Unsupervised model.

Types of Clustering

  1. K-Means         Hierarchical

Clustering is performed using Euclidian Distance.

In this blog I have created a cluster using K-Means

Why K-Means.

  1. Less impacted by Outliers
  2. Faster Computation
  3. Reading output visual is better when we have a huge data.

Building a Cluster in SAP Analytics Cloud

  1. To build a Cluster in SAP Analytics Cloud lets us create a story and select a R visualization widget.

 

  1. Select the required column in building the Click on input date and select  Dimension   and the  measure on which we need to build cluster.

For my model I used Customer, Country dimension,  Date, Invoice No, New LineTotal,Description

  1. Now Click on Edit Script and start build the code in Editor
  2. Create a Recency, Frequency and Monetary columns

Code:

CustDataFrame<- CustDateFrame %>%

                               group_by( Description)

                               summarise(Recency =as.numric(Sys.Date()-max(Date)),

                                                    Frequency =n_distinct(InvoiceNo)

                                                    Monetary =sum(NewLineTotal)

                                                   )

The above code groups customers based on the Recency of visit, Frequency of Purchase and Amount of Spend. This is my base table on which I will do clustering. I created these new columns  as              K-Means clustering use distance , the records with close distance are  grouped together.

 

  1. Now we will start our clustering process, before we start  we must load library .

Library(“Cluster”)

Library(“Magrittr”)

Library(“Factoextra”)

 

  1. Clustering
  • Scale the date
  • Calculate the count of required cluster ( you must use subject knowledge to in deciding the cluster count)
  • Build the cluster
  • Visualize the cluster
  • Profile the cluster

Code below

 library(“cluster”)

clusterData<- CustDataFrame

set.seed(123)

clusterData$CustomerID<- as.character(clusterData$CustomerID)

ScaledCluster <-scale(clusterData[,3:5])  #scale data

Library(NbClust)

Nc<-NbClust(clusterData[,-c(1)],min.nc=2,max.nc=4,method=”kmeans”) # cluster count

kmeans.clus=kmeans(x=ScaledCluster,centers=5,nstart=25)  #Kemans

kmeans.clus

library(fpc)

clusterData$Cluster<-kmeans.clus$cluster

library(ggplot2)

plt<-ggplot(data=clusterData, aes(FrequencyofVisit,Monetory,color=factor(Cluster),legend =”Cluster”,label =CustomerID))+

       geom_text()   # plot a Graph

agg<-aggregate(clusterData[,-c(1,2,6)],list(clusterData$Cluster),mean)

agg$Recency<-as.integer(agg$Recency)

agg$FrequencyofVisit<-as.integer(agg$FrequencyofVisit)

agg$Monetory<-as.integer(agg$Monetory)

names(agg)[names(agg)==”Group.1″]<-“Cluster”

library(gridExtra)

agg <-agg[,c(1,2,3,4)]

library(gtable)

tt2 <- ttheme_default(

  panel.border = element_blank(),

  ,

  panel.grid.major = element_blank(),

  panel.grid.minor = element_blank(),

  core=list(bg_params = list(fill = c(“#fafafa”, “white”), lwd=2, lty=1, col=”#cecece”),

            fg_params=list(hjust=1, x=0.95, cex=1.25)),

  colhead=list(bg_params = list(fill = “#4682b4”, col=NA, lwd=5),

               fg_params=list(col = “white”, fontface=”bold”, cex=1.25)),

  rowhead=list(fg_params=list(hjust=0, x=0, cex=1.25, fontface=”plain”)))

tg<-tableGrob(agg, theme = tt2)   # Plot a Aggerate table

grid.arrange(plt,tg,nrow=2,

             as.table=TRUE,

             heights=c(3,1))

 Note: Above code is  written as per the data structure. Sample data screen below

 

  1. Below is the Visualization, where I have divided a data in 5 cluster based on their Frequency, Recency and Monetary.

 

Note: We might see  lot of overlap in few cluster, That is a complete other discussion which will be covered separately.

The below table shows the cluster profile based on average Spending, Frequency and Recency.

        

Conclusion:- Based on above table and Chart we can profile our Cluster and segment our customers

Cluster Profiling is done based on Avg Recency. Frequency and Monetory.

Cluster 1 (Monthly Customer) :-    Groups Customers  who shops ,24 days on Avg, with Normal spending .

Cluster 2 ( Window Shopper ) :-    Groups Customers who Shops on an Avg of 50 days ,but not frequently and not a good   Spender

Cluster 3 (Regular Customer):-    Groups Customers who shops very frequently.

Cluster 4 (Lost Customer):-    Groups Customer  who has not visited for long

Cluster 5 (Loyal Customer):-    Groups Customer who are High Spender, frequent visitor and Recent buyer.

Using Clustering we can  segment our customer into various profiles and help us in identifying the customer who need attention and where our new plans and campaign must be rolled out

Similarly we can cluster Product too ,Below is the Visualization example.

 

 

 

Harshavardhan Kadam

Data Visualization and Analytics

 

Assigned Tags

      4 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo shruti srivastava
      shruti srivastava

      Great Blog ! Informative !

      Author's profile photo Tapan Rana
      Tapan Rana

      Congrats for giving light on K means clustering ...I can remember in SBO HANA ready views are available for KPI etc , dashboards etc. while tuning those views in pervasive analytics as an option K-means selection is given but in SAC by total customized way....very nice

      Author's profile photo Henry Banks
      Henry Banks

      Hi harshavardhan kadam

      very nice job decoding R- visualizations for our community. Much appreciated!

      don't forget we have a clustering algorithm out-of-the-box for end users too, when using correlation charts - with ref. to:  "Smart Grouping"  also see the how-to video here 

      regards

      H

       

      Author's profile photo harshavardhan kadam
      harshavardhan kadam
      Blog Post Author

      I will go through that post. thank you