Skip to Content

I noticed that there is no standard component for Binning in SAP PA although Binning is required at many places for doing analysis.

In several statistical analyses there is a need for having categorical variables rather than continuous variables. Especially in credit scoring models continuous variables are often transformed into categorical variables for better analysis. Also in case of big data analysis is faster if we use categorical variables as opposed to continuous ones.  The process of converting continuous variables into categorical variables is called Binning. In simpler words Binning is a way to group a number of more or less continuous values into a smaller number of “bins”. For e.g. , if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals. The component below allows you to bin a continuous variable into n equally sized (by number of observations) bins.

I have taken a sample data of a credit card client. They have assigned scores to customers based on credit limit and now want to classify the number of customers based on the score range that they fall in. Based on the range I went ahead and created a Mosaic plot. We can further use this data in our predictive algorithms to predict the score range of a new customer based on other variables. One can modify the code as per their need:

Setting up the component:

7.PNG

Column to be Categorized: Give the continuous variable that you want to convert to a categorical variable. Needs to be numeric.

Number of Categories: The number of categories that will be created for the continuous variable above.

8.PNG

Output:

As seen below the variable (Score) is now categorized into 4 different categories of equal distribution.

9.PNG

10.PNG

CODE:

mymain <- function (mydata, BinColumnStr, numBrks)

{

## Package Required for Creating Mosaic Plot

library(vcd)

## Capturing the column that needs to be categorized

mycolumn <- mydata[,BinColumnStr]

## Creating the Categories

mydata$Category<- cut(mycolumn, breaks=as.numeric(numBrks), include.lowest=TRUE)

## Tabulating the categories for Mosaic Plot

output1 <- xtabs(Count~Region+Category, data=mydata)

## Aggregating the count based on Region & Category

myaggregation<- aggregate(Count ~ Region+Category, data=mydata, FUN=sum)

output <- data.frame(myaggregation$Region, myaggregation$Category, myaggregation$Count)

## Creating Mosaic Plot

mosaic(output1, shade=TRUE)

return(list(out=output))

}

Please put your comments if there is anything I can add to this code.

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply