Skip to Content
Author's profile photo Andreas Forster

R Extension – Data Aggregation

Typically you will want to work on detailed data, but sometimes aggregations are needed. This little “R Extension” enables you to aggregate your data within the analytical workflow.

This can come in handy for instance for Time Series Forecasting when your dataset is too detailed but does not have a true date field. (If you have a date field then the native ‘Triple Exponential Smoothing’ component can take care of the aggregation)

Here is an example. My data sets contains information about how many cars were registered in Switzerland from 2007 to 2012 each month in each region (called Canton).

/wp-content/uploads/2013/07/01_1022557.png

Now I want to predict how many cars will be registered in future each month. So I need to aggregate by month, taking the Canton out of the data set.

The new R Extension (here called Aggregation) allows you to select

  • The field to aggreate: Car Registrations
  • The aggregation type: Sum
  • The column to group by: Year-Month

Available aggregation types in this component are: Sum, Minimum, Maximum, Count, Mean and Median.

/wp-content/uploads/2013/07/02_1022579.png

The component outputs two columns named GROUP_BY and AGGREGATED_MEASURE. If you like you can use the standard ‘Data Type Definition” Component from the ‘Data Preparation’ tab to rename the columns to something more meaningful. This step is optional.

/wp-content/uploads/2013/07/03_1022580.png

Execute the flow and you see the aggregates, which could be used for intsance in forecasting.

/wp-content/uploads/2013/07/05_1022581.png

How to Implement

The component can be downloaded as .spar file from GitHub. Then deploy it as described here. You just need to import it through the option “Import Model/Extension”, which you will find by clicking on the plus-sign at the bottom of the list of the available algorithms.

Disclaimer

Please note that this component is not an official release by SAP and that it is provided as-is without any guarantee or support. Please test the component to ensure it works for your purposes.

Assigned tags

      8 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Mister Makmerphy
      Mister Makmerphy

      great post, thanks. But what if we need multiple columns? I changed the groupByColumnStr from Single to Multiple and I'm getting error message if I choose multiple char values. Everything is fine if 1 char columns is selected.

      Author's profile photo Andreas Forster
      Andreas Forster
      Blog Post Author

      A few more changes are needed to be able to group by 2 columns. The easiest is to bring in another input parameter for the 2nd column (GroupByColumnStr2) and another output column (GROUP_BY2).

      Here is the adjusted code. I have only given it a quick test, but it seems to work ok

      mymain <- function (mydata, MeasureColumnStr, AggregationTypeStr, GroupByColumnStr, GroupByColumnStr2)

      {

      ## Translate the aggregation type presented to the end user into R syntax.

      mytranslator <- list('Sum'='sum',  'Minimum'='min', 'Maximum'='max',  'Count'='length', 'Mean'='mean', 'Median'='median')

      ## Retrieve the relevant columns from the DataFrame.

      mymeasure <- mydata[,MeasureColumnStr]

      mycolumn <- mydata[,GroupByColumnStr]

      mycolumn2 <- mydata[,GroupByColumnStr2]

      ## Aggregate based on user preference.

      myaggregation<- aggregate(mymeasure ~ mycolumn + mycolumn2, data=mydata, FUN=mytranslator[[AggregationTypeStr]])

      ## Return the aggregated data. Note that the original detailed DataFrame is not placed in the output as we want the aggregation only.

      output <- data.frame(myaggregation$mycolumn, myaggregation$mycolumn2, myaggregation$mymeasure)

      return(list(out=output))

      }

      Author's profile photo David Smith
      David Smith

      Great sample code!!  Thank you!!

      Author's profile photo Former Member
      Former Member

      Nice one! I'm a non-R person and found this very easy to use on some live customer data to enable PA's forecasting.

      Author's profile photo Former Member
      Former Member

      Simple and very efficient code.

      Thank you!

      Gabriel

      Author's profile photo Deepti Lakshmi
      Deepti Lakshmi

      Are there any specific roles that needed to be assigned to this user to run this code? For me , i cannot run this code with a demo user. Only admin user can run this? Should i assign anything extra to my demo user to run this?

      Author's profile photo Andreas Forster
      Andreas Forster
      Blog Post Author

      Hello Deepti,

      when running this code on the SAP PA computer (so with data that got downloaded) you do not need any specific user or access rights.

      Or are you working with data held in HANA? Are you getting an error message?

      Greetings

      Andreas

      Author's profile photo Former Member
      Former Member

      Just what I needed. Thanks!