How the Simpson Paradox can play tricks on you
March 17th, 2021. This is a normal day at ORGO, a producer of edible organic oil. 9 o’clock in the morning, John sends the below chart to Alex, head of marketing.
According to this chart, men like sesame oil more than flaxseed oil, so do the women.
Alex also wants the aggregated view.
No problem. John adds a chart in his SAP Analytics Cloud story. Here is the result obtained by John.
As usual when a difficulty occurs, John rubs his neck. There must be something wrong somewhere.
Let’s define a table to see the values that hide behind the calculated measure: %Likes.
The values for the calculated ratio % Likes seem correct.
John cannot send to the CMO of ORGO the current story. His reputation is at stake. The aggregated numbers contradict the detailed ones. He needs a cappuccino …
The calculations in the table are correct, and at the same time the results don’t make sense. Who could help him? Margaret maybe (Margaret studied at Berkeley in the mid-seventies, she loves statistics and Jimmy Hendrix). This phenomenon is called the Simpson Paradox, she said. It can happen whenever you calculate a percentage of total between groups of different sizes. To work around the issue Margaret suggests averaging the ratios.
John creates the following calculated measure:
And obtains this chart:
She is right, the paradox vanished.
This is a quick and dirty fix though. A better solution would be to have the same number of answers in each group, so that a Like has the same weight regardless of gender and oil type. Having equal sized groups will prevent the issue John ran into, that is (Margaret writes on the whiteboard):
John nods his head. Margaret explains to him that, with balanced groups, the ratio of the sums will equal the average of the ratios. Here is an example:
Thank you so much, Margaret.
Half past twelve. John’s stomach rumbles. He goes for lunch with a sense of relief.