Skip to Content
Technical Articles
Author's profile photo Jan Zwickel

Join cardinality setting in Calculation Views: Influence of wrong cardinality settings

This example is referenced in my other post about the usage of the join cardinality settings in SAP HANA Calculation Views. Please have a look at the other post to get a better understanding of the context for this example

 

The following three examples will illustrate that the resulting values of measures can change depending on whether join pruning occurs or not if a wrong cardinality is maintained (see context  on join pruning). The reasons for the wrong cardinalities are different in each example:

a) wrong from the outset

In this example different resulting values for measure “amount” are obtained depending on whether the join is executed or not because the cardinality is wrongly set. To create the example set the cardinality of a left-outer join between “employee” and “manager” to “1..1”:

 

Cardinality is set to “1..1” in contrast to the actual data cardinality

 

 

This is the step which starts the trouble: You pretend a cardinality of “1..1” which does not hold for the data

Now compare the results of the following two queries

Different results are displayed depending on whether fields from both tables or only from left-table are requested

 

As you can see different values for measure “amount” occur for “Alice” and “Hans”. What is the reason?

In the left query all fields are coming from the left table and, based on the cardinality setting, join pruning can occur. Therefore, the join is not executed and in consequence the multiple matching entries for “Alice” in the right table (compare content of the tables shown here) are not found.

In the query on the right side a join is forced by querying also “manager” from the right table. Here, join execution leads to trippling of the record for “Alice” in the left table and the result is 3*22=66 for field “amount”. Similarly for Hans: Hans has two matching entries in the right table (Hans has two employees) and therefore measure “amount” results in 2*20=40.

Nothing is wrong here with the optimizer but with the setting of the cardinality that does not match the data.

In the current example, cardinality had been set wrongly from the beginning. In contrast in the following examples situations will be described that can lead to wrong cardinality settings even though the settings are correct at the beginning.

 

b) remodeling (e.g., adding another field, switching aggregations to projections)

The first example starts with a model that produces the same values of “amount” irrespective of whether join pruning occurs or not. After adding an additional field with a “keep flag” the “amount” for “Alice” and “Hans” will change depending on whether join pruning occurs or not.

The following view is a copy of the previous view with the only change that an aggregation node is inserted before the join. The aggregation only maps field “manager”:

 

Aggregation node removes field “employee”

 

When only field “manager” is mapped in the aggregation node multiple values of the same “manager” will be condensed into one value by the aggregation node.

Run the two SQL statements below. The first one allows join pruning to occur as it only requests fields from the left side, the second one prevents join pruning by selecting also field “manager” from the right table.

Requested fields do not influence the results for field “amount” when an aggregation node reduces field “manager” to distinct values

 

You see the same values for “amount” independent of whether or not join pruning occurs. The reason is that the aggregation on the join field “manager” will make sure that only distinct values enter the join from the right side. Your cardinality setting of “1..1” is correct.

Now map also “employee” in the aggregation node and set the “keep flag” to make sure that the field “employee” is not pruned away if not requested (more information about the keep flag can be found here)

 

Setting of keep flag

 

If you run the same queries again “Alice” and “Hans” will have different values for “amount” depending on whether join pruning occurs or not:

 

Different results occur when field “employee” is kept in the aggregation and leads to multiple occurrences of the same value of field “manager”

 

This means that a change to your model (adding another field) below your join node might make your right table entries no longer unique when arriving at the join node which violates the cardinality setting. The consequences are that the values for “amount” depend on whether join pruning occurs or not.

The second example in this section starts with the same view but this time the aggregation is replaced by a projection:

Aggregation node is changed to a projection node

 

 

Changing the aggregation to a projection means that “manager” no longer provides only distinct values to the join.

Run the following two queries and compare their results:

Different results occur when the aggregation node is changed to a projection node

 

 

As you can see, now join pruning has an influence on the resulting values of measure “amount”.

To recap, model changes below your join node (this time a switch of an aggregation to a projection) may lead to a violation of your join cardinality setting.

c) after loading additional data

The third example again starts with a correct cardinality setting. Use the first view without any additional nodes but exchange the table “employees” by a copy of the table. A copy of the table is used so that it can be modified without influencing the other models build before:

 

Join is defined on field “employee” on both sides

 

 

The join is defined on “employee” and the cardinality reflects the true data cardinality of “1..1”. We will query “employee” in addition from the right table to check the impact of join pruning.

Run the following two queries and compare their results:

 

Consistent results are returned when the cardinality setting in the view reflects the data cardinality

 

Not surprisingly with the correct cardinality setting the same values for “amount” are returned independent of whether join pruning occurs or not.

Now, let’s imagine that “Donald” receives a second manager and add this record to table “CopyOfEmployee”:

insert into “joinCardinalityExample.db.tables::CopyOfemployees” values (‘Herbert’,’Donald’)

 

After this change your cardinality assumption of “1..1” for field “employee” will not hold any longer. As a consequence, you will see different values for “amount” of “Donald” depending on whether the join is executed or not:

 

Inconsistent results are returned when additional data lead to a difference between cardinality setting and data cardinality

 

The “amount” of “Donald” varies depending on whether the join is executed or not. This is a consequence of the cardinality setting which had been true at the beginning but became wrong after additional data had been loaded.

 

SAP Note 2737941 can help you in detecting wrong join cardinality settings based on the current data. However, after additional loading of data the check would have to re-run.

This example is referenced in my other post about the usage of the join cardinality settings in SAP HANA Calculation Views. Please have a look at the other post to get a better understanding of the context for this example

Assigned Tags

      4 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo G B
      G B

      Hi Jan

      In the very first example, cardinality is set as wrong which is true, but the join pruning didn't occur as we requested manager field from right table. The results of amount column depend on the values returned by the join condition itself. How did the wrong cardinality affect the amount values?

      Author's profile photo Jan Zwickel
      Jan Zwickel
      Blog Post Author

      Hi,

      If I understand your question correct you are refering to the right side of the figure with caption "Different results are displayed depending on whether fields from both tables or only from left-table are requested"

      Here "manager" is requested from the right table. This means join pruning cannot occur and the join has to be executed. When the join is executed e.g., "Alice" is found three times on the right table which means the amount from the left table of 22 is replicated three times.

      You might try that out with the following SQL statement that uses a leftTable with one entry for "Alice" and a rightTable with three entries for "Alice". The leftouter join will lead to the multiplication of the "amount" in the left table by the number of records with "Alice" in the rightTable:

       

      select

        leftTable."employee",

       sum("amount")

      from

      (

      select

       'Alice' AS "employee",

        20 AS "amount"

      from

       dummy

      ) leftTable left outer join

      (

      select

       'Alice' AS "manager",

       'Hugo' AS "employee"

      from

       dummy

      UNION

       select

        'Alice' AS "manager",

        'Donald' AS "employee"

      from

       dummy

      UNION

       select

        'Alice' AS "manager",

        'Lisa' AS "employee"

       from

        dummy

      ) rightTable

      on

      leftTable."employee"=rightTable."manager"

      group by

      leftTable."employee"

       

      If you just want to see the three times replication of Alice:

      select

      leftTable."employee",

      ("amount")

      from

      (

      select

      'Alice' AS "employee",

      20 AS "amount"

      from

      dummy

      ) leftTable left outer join

      (

      select

      'Alice' AS "manager",

      'Hugo' AS "employee"

      from

      dummy

      UNION

      select

      'Alice' AS "manager",

      'Donald' AS "employee"

      from

      dummy

      UNION

      select

      'Alice' AS "manager",

      'Lisa' AS "employee"

      from

      dummy

      ) rightTable

      on

      leftTable."employee"=rightTable."manager"

      Best,
      Jan

       

      Author's profile photo G B
      G B

      Hi Jan

      I do understand the reason behind the populated amount values. And this is because of the join execution. So, how did the wrong cardinality setting affect the results, because it is indeed the join execution?

      Author's profile photo Jan Zwickel
      Jan Zwickel
      Blog Post Author

      Hi,

      If the join cardinality was not set wrongly but let's say 1:n the optimizer would have known that the join cannot be pruned and therefore the join would have been executed. This is also described in the following post . The current page is only a subpage of this topic.

      Best,
      Jan