Skip to Content

This component helps transform a numerical variable. Just select your variable and choose from a number of common transformations or enter your own R code.

NumericalTransform.png

The plot in the center shows the variable’s density without transformation. Plots located around it show the density of commonly used transformations as well as an optional transformation with custom R code. The red frame indicates which transformation was chosen by the user and will be output in a new column.

Disclaimer

Please note that this component is not an official release by SAP and that it is provided as-is without any guarantee or support. Please test the component to ensure it works for your purposes.

Prerequisites

R libraries e1071 and gplots must be installed.

Limitations

Please let me know should you encounter any limitations.

Usage

These parameters can be set by the user.

Parameter Description
Numerical Variable to Transform

Select the numerical variable. In case a non-numerical variable is chosen, the output column will be empty .

Select Transformation Choose from preset transformations (“Natural Logarithm”, “Squared”, “No Transformation”, “Square Root”, “Exponential Value” or “Custom”). If “Custom is chosen, the R code from the “Custom R Transformation” parameter will be used.
Custom R Transformation

R code for custom requirements, ie

sqrt(x+1)

Output column added by this component

Column Description
TransformedVariable

The selected variable with the chosen transformation.

How to Implement

The component can be downloaded as .spar file from GitHub. Then deploy it as described here. You just need to import it through the option “Import/Model Component”, which you will find by clicking on the plus-sign at the bottom of the list of the available algorithms.

Example

You can use such a transformation to increase the quality of a linear regression, for instance. The dataset adverts.csv is often used in teaching R. It lists a few companies, their TV advertising budget from 1983 in million dollars (spend) and the retained impressions per week in millions (milimp). You can use the spend to estimate the retained impressions with a linear regression. Taking the logarithm of the spend (instead of the actual spend) improves the quality of the model.

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply