Join cardinality setting in Calculation Views: Inf...

jan_zwickel · ‎10-27-2017

This example is referenced in my other post about the usage of the join cardinality settings in SAP HANA Calculation Views. Please have a look at the other post to get a better understanding of the context for this example

The following three examples will illustrate that the resulting values of measures can change depending on whether join pruning occurs or not if a wrong cardinality is maintained (see context on join pruning). The reasons for the wrong cardinalities are different in each example:

a) wrong from the outset

In this example different resulting values for measure "amount" are obtained depending on whether the join is executed or not because the cardinality is wrongly set. To create the example set the cardinality of a left-outer join between "employee" and "manager" to "1..1":

Cardinality is set to "1..1" in contrast to the actual data cardinality

This is the step which starts the trouble: You pretend a cardinality of "1..1" which does not hold for the data

Now compare the results of the following two queries

Different results are displayed depending on whether fields from both tables or only from left-table are requested

As you can see different values for measure “amount” occur for "Alice" and "Hans". What is the reason?

In the left query all fields are coming from the left table and, based on the cardinality setting, join pruning can occur. Therefore, the join is not executed and in consequence the multiple matching entries for “Alice” in the right table (compare content of the tables shown here) are not found.

In the query on the right side a join is forced by querying also “manager” from the right table. Here, join execution leads to trippling of the record for “Alice” in the left table and the result is 3*22=66 for field "amount". Similarly for Hans: Hans has two matching entries in the right table (Hans has two employees) and therefore measure "amount" results in 2*20=40.

Nothing is wrong here with the optimizer but with the setting of the cardinality that does not match the data.

In the current example, cardinality had been set wrongly from the beginning. In contrast in the following examples situations will be described that can lead to wrong cardinality settings even though the settings are correct at the beginning.

b) remodeling (e.g., adding another field, switching aggregations to projections)

The first example starts with a model that produces the same values of “amount” irrespective of whether join pruning occurs or not. After adding an additional field with a “keep flag” the "amount" for “Alice” and "Hans" will change depending on whether join pruning occurs or not.

The following view is a copy of the previous view with the only change that an aggregation node is inserted before the join. The aggregation only maps field "manager":

Aggregation node removes field "employee"

When only field "manager" is mapped in the aggregation node multiple values of the same "manager" will be condensed into one value by the aggregation node.

Run the two SQL statements below. The first one allows join pruning to occur as it only requests fields from the left side, the second one prevents join pruning by selecting also field “manager” from the right table.

Requested fields do not influence the results for field "amount" when an aggregation node reduces field "manager" to distinct values

You see the same values for "amount" independent of whether or not join pruning occurs. The reason is that the aggregation on the join field “manager” will make sure that only distinct values enter the join from the right side. Your cardinality setting of "1..1" is correct.

Now map also “employee” in the aggregation node and set the “keep flag” to make sure that the field “employee” is not pruned away if not requested (more information about the keep flag can be found here)

Setting of keep flag

If you run the same queries again "Alice" and "Hans" will have different values for “amount” depending on whether join pruning occurs or not:

Different results occur when field "employee" is kept in the aggregation and leads to multiple occurrences of the same value of field "manager"

This means that a change to your model (adding another field) below your join node might make your right table entries no longer unique when arriving at the join node which violates the cardinality setting. The consequences are that the values for “amount” depend on whether join pruning occurs or not.

The second example in this section starts with the same view but this time the aggregation is replaced by a projection:

Aggregation node is changed to a projection node

Changing the aggregation to a projection means that “manager” no longer provides only distinct values to the join.

Run the following two queries and compare their results:

Different results occur when the aggregation node is changed to a projection node

As you can see, now join pruning has an influence on the resulting values of measure "amount".

To recap, model changes below your join node (this time a switch of an aggregation to a projection) may lead to a violation of your join cardinality setting.

c) after loading additional data

The third example again starts with a correct cardinality setting. Use the first view without any additional nodes but exchange the table "employees" by a copy of the table. A copy of the table is used so that it can be modified without influencing the other models build before:

Join is defined on field "employee" on both sides

The join is defined on "employee" and the cardinality reflects the true data cardinality of "1..1". We will query “employee” in addition from the right table to check the impact of join pruning.

Run the following two queries and compare their results:

Consistent results are returned when the cardinality setting in the view reflects the data cardinality

Not surprisingly with the correct cardinality setting the same values for "amount" are returned independent of whether join pruning occurs or not.

Now, let’s imagine that “Donald” receives a second manager and add this record to table “CopyOfEmployee”:

insert into "joinCardinalityExample.db.tables::CopyOfemployees" values ('Herbert','Donald')

After this change your cardinality assumption of "1..1" for field "employee" will not hold any longer. As a consequence, you will see different values for “amount” of “Donald” depending on whether the join is executed or not:

Inconsistent results are returned when additional data lead to a difference between cardinality setting and data cardinality

The "amount" of "Donald" varies depending on whether the join is executed or not. This is a consequence of the cardinality setting which had been true at the beginning but became wrong after additional data had been loaded.

SAP Note 2737941 can help you in detecting wrong join cardinality settings based on the current data. However, after additional loading of data the check would have to re-run.

This example is referenced in my other post about the usage of the join cardinality settings in SAP HANA Calculation Views. Please have a look at the other post to get a better understanding of the context for this example