Skip to Content

Typically you will want to work on detailed data, but sometimes aggregations are needed. This little “R Extension” enables you to aggregate your data within the analytical workflow.

This can come in handy for instance for Time Series Forecasting when your dataset is too detailed but does not have a true date field. (If you have a date field then the native ‘Triple Exponential Smoothing’ component can take care of the aggregation)

Here is an example. My data sets contains information about how many cars were registered in Switzerland from 2007 to 2012 each month in each region (called Canton).

/wp-content/uploads/2013/07/01_1022557.png

Now I want to predict how many cars will be registered in future each month. So I need to aggregate by month, taking the Canton out of the data set.

The new R Extension (here called Aggregation) allows you to select

  • The field to aggreate: Car Registrations
  • The aggregation type: Sum
  • The column to group by: Year-Month

Available aggregation types in this component are: Sum, Minimum, Maximum, Count, Mean and Median.

/wp-content/uploads/2013/07/02_1022579.png

The component outputs two columns named GROUP_BY and AGGREGATED_MEASURE. If you like you can use the standard ‘Data Type Definition” Component from the ‘Data Preparation’ tab to rename the columns to something more meaningful. This step is optional.

/wp-content/uploads/2013/07/03_1022580.png

Execute the flow and you see the aggregates, which could be used for intsance in forecasting.

/wp-content/uploads/2013/07/05_1022581.png

How to Implement

The component can be downloaded as .spar file from GitHub. Then deploy it as described here. You just need to import it through the option “Import Model/Extension”, which you will find by clicking on the plus-sign at the bottom of the list of the available algorithms.

Disclaimer

Please note that this component is not an official release by SAP and that it is provided as-is without any guarantee or support. Please test the component to ensure it works for your purposes.

To report this post you need to login first.

8 Comments

You must be Logged on to comment or reply to a post.

  1. Mister Makmerphy

    great post, thanks. But what if we need multiple columns? I changed the groupByColumnStr from Single to Multiple and I’m getting error message if I choose multiple char values. Everything is fine if 1 char columns is selected.

    (0) 
    1. Andreas Forster Post author

      A few more changes are needed to be able to group by 2 columns. The easiest is to bring in another input parameter for the 2nd column (GroupByColumnStr2) and another output column (GROUP_BY2).

      Here is the adjusted code. I have only given it a quick test, but it seems to work ok

      mymain <- function (mydata, MeasureColumnStr, AggregationTypeStr, GroupByColumnStr, GroupByColumnStr2)

      {

      ## Translate the aggregation type presented to the end user into R syntax.

      mytranslator <- list(‘Sum’=’sum’,  ‘Minimum’=’min’, ‘Maximum’=’max’,  ‘Count’=’length’, ‘Mean’=’mean’, ‘Median’=’median’)

      ## Retrieve the relevant columns from the DataFrame.

      mymeasure <- mydata[,MeasureColumnStr]

      mycolumn <- mydata[,GroupByColumnStr]

      mycolumn2 <- mydata[,GroupByColumnStr2]

      ## Aggregate based on user preference.

      myaggregation<- aggregate(mymeasure ~ mycolumn + mycolumn2, data=mydata, FUN=mytranslator[[AggregationTypeStr]])

      ## Return the aggregated data. Note that the original detailed DataFrame is not placed in the output as we want the aggregation only.

      output <- data.frame(myaggregation$mycolumn, myaggregation$mycolumn2, myaggregation$mymeasure)

      return(list(out=output))

      }

      (0) 
  2. Deepti Lakshmi

    Are there any specific roles that needed to be assigned to this user to run this code? For me , i cannot run this code with a demo user. Only admin user can run this? Should i assign anything extra to my demo user to run this?

    (0) 
    1. Andreas Forster Post author

      Hello Deepti,

      when running this code on the SAP PA computer (so with data that got downloaded) you do not need any specific user or access rights.

      Or are you working with data held in HANA? Are you getting an error message?

      Greetings

      Andreas

      (0) 

Leave a Reply