Custom R Component – Measures of Location (Mean, Standard Deviation…)
This component extends the capabilities of SAP Predictive Analysis and calculates certain measures of location:
– Confidence Interval of the Mean
– Standard Deviation
– Total Record Count
– Record Count of non-null Values
The dataset must contain at least one numerical column, which will be described with the measures of location.
These parameters can be set by the user.
|Group by Column||Name of categorical column for level-based statistics. Calculates the measures of location for each subgroup. If the column “Country” is selected for instance, then individual statistics will be calculated for each country, ie Switzerland, China and Brasil.|
|Measure(s) to Describe||Names of one or more numerical column that are to be described, ie Revenue, Duration, etc.|
|Calculate Overall Statistics||Controls whether the overall statistics of the unfiltered dataset are to be calculated.|
|Confidence Level for Mean Interval||Confidence level for calculating the lower and upper limits of the mean interval.|
|GroupByColumn||The levels of the group by column which is described by the row, ie Switzerland, China or Brazil. The statistics of the unfiltered dataset (if selected) is labelled “OVERALL”.|
|Measure||Name of the measure that is described in the row.|
|CountNotNA||Row count of non-null values.|
|MeanConfidenceLevel||Confidence level for the mean interval as entered by the user.|
|MeanCILL||Lower limit of the mean’s confidence interval.|
|MeanCIUL||Upper limit of the mean’s confidence interval.|
How to Implement
The component can be downloaded as .spar file from GitHub. Then deploy it as described here. You just need to import it through the option “Import/Model Component”, which you will find by clicking on the plus-sign at the bottom of the list of the available algorithms.
Let’s try this component on the responses of a customer survey carried out by the airport of San Francisco. Download the dataset from the year 2011, load the data with SAP Predictive Analysis and add the “Measures of Location” component to the dataflow.
Configure the component to
– analyse the satisfaction with SFO Airport as a whole (column “Q8N”).
– calculate the measures of location by the respondent’s country of residence (column “Q18COUNTRY”).
– calculate also the overall measures of location for the unfiltered dataset.
Run the component and see the output.
You can further process the results, for instance by graphically analysing them with SAP Predictive Analysis.
Please note that this component is provided as-is without any guarantee or support.