Custom R Component – Numerical Transform
This component helps transform a numerical variable. Just select your variable and choose from a number of common transformations or enter your own R code.
The plot in the center shows the variable’s density without transformation. Plots located around it show the density of commonly used transformations as well as an optional transformation with custom R code. The red frame indicates which transformation was chosen by the user and will be output in a new column.
Disclaimer
Please note that this component is not an official release by SAP and that it is provided as-is without any guarantee or support. Please test the component to ensure it works for your purposes.
Prerequisites
R libraries e1071 and gplots must be installed.
Limitations
Please let me know should you encounter any limitations.
Usage
These parameters can be set by the user.
Parameter | Description |
---|---|
Numerical Variable to Transform | Select the numerical variable. In case a non-numerical variable is chosen, the output column will be empty . |
Select Transformation | Choose from preset transformations (“Natural Logarithm”, “Squared”, “No Transformation”, “Square Root”, “Exponential Value” or “Custom”). If “Custom is chosen, the R code from the “Custom R Transformation” parameter will be used. |
Custom R Transformation |
R code for custom requirements, ie sqrt(x+1) |
Output column added by this component
Column | Description |
---|---|
TransformedVariable | The selected variable with the chosen transformation. |
How to Implement
The component can be downloaded as .spar file from GitHub. Then deploy it as described here. You just need to import it through the option “Import/Model Component”, which you will find by clicking on the plus-sign at the bottom of the list of the available algorithms.
Example
You can use such a transformation to increase the quality of a linear regression, for instance. The dataset adverts.csv is often used in teaching R. It lists a few companies, their TV advertising budget from 1983 in million dollars (spend) and the retained impressions per week in millions (milimp). You can use the spend to estimate the retained impressions with a linear regression. Taking the logarithm of the spend (instead of the actual spend) improves the quality of the model.