This component adds a Pareto Plot to SAP Predictive Analytics.

It also outputs the aggregated data that underlies the plot, so that the transformed data can be used elsewhere.

**Disclaimer**

Please note that this component is not an official release by SAP and that it is provided as-is without any guarantee or support. Please test the component to ensure it works for your purposes.

**Prerequisites**

- Libraries dplyr and gplots have to be installed.

**Limitations**

Please let me know should you encounter any limitations.

**Usage**

These parameters can be set by the user:

Parameter |
Description |
---|---|

Categorical Variable |
The label column by which the numerical variable will be summarised. |

Numerical Variable | The column, which will be summarised by the lables found in the categorical variable. |

Output columns:

Column |
Description |
---|---|

Label |
The values found in the categorical variable. |

Value | The numerical variable summarised by the row’s label. The data is sorted descendingly on this column. |

ValueCumulated | The cumulated value. |

Percent | The row’s percentual contribution to the total sum of the numerical variable. |

PercentCumulated | The cumulated percentage. |

**How to Implement**

The component can be downloaded as .spar file from GitHub. Then deploy it as described here. You just need to import it through the option “Import/Model Component”, which you will find by clicking on the plus-sign at the bottom of the list of the available algorithms.

**Example**

You can try the pareto plot on the attached dataset, which lists how many passengers embarked on an airplane at San Francisco airport. Select “PassengerCount” as numerical variable and choose a categorical column.

You can see the resulting pareto plot. Here the plot is broken down by geographical region.

You can also see the aggregated data that was used to build the plot.

You could use that data for instance in the “Prepare” tab to easily produce a more interactive pareto plot.