Optimize Join Columns Flag
Purpose of optimize join columns flag
Per default, the field on which a join is defined will be requested during query processing, independently of whether the field is necessary from a query point of view. This guarantees consistent aggregation behavior concerning the join field independently of the fields that are requested by the query. Support of a stable aggregation behavior – independent of the fields that are requested by the query – makes sense because a change in aggregation behavior can lead to a change of the resulting values of measures.
As an example of a situation when the join column can be omitted imagine the following scenario: a query requests only fields from one join partner, the cardinality to the other partner (of which no fields are requested) is set to 1, and it is no inner join. In such situations execution of the join does not influence the result (see blog for explanations of join pruning). Therefore, the join does not have to be executed and if the join field is not requested by the query it could be pruned (not requested for intermediate processing). Per default this join column pruning will not happen but if the optimize join columns flag is set the join field can be pruned.
Relevance of optimize join columns flag
In situations with many join partners on different join fields and in which queries typically only select fields from a small subset of join partners using the optimize join columns flag allows omitting various join fields from the aggregation. The reason is that the flag explicitly states that the join fields do not have to be included in the aggregation if not requested in the query itself. This means that by omitting the join field and thus omitting a field that otherwise might increase the aggregation granularity, the number of records that are processed can be heavily reduced. As a consequence better performance and lower memory consumption can be expected. The amount of the benefit depends on what fields the queries request at runtime and on which fields the joins are defined.
Side-Effects of using optimize join columns flag
When using the optimize join columns flag you should be aware of the following side effects:
- Having a wrong cardinality setting might lead to wrong (from the perspective of the user – the optimizer is doing its job well) results. This has nothing to do with the optimize join columns flag itself except that setting the cardinality is a pre-requisite (see below).
- The second impact will be that the optimize join columns flag will do what is requested: it will change the aggregation level by excluding the join column from the aggregation, if possible. Whether the join column can be excluded depends also on the queried fields. This means whether the join column is included in the aggregation level varies with the fields that are requested by the query. If you have aggregation functions for which the aggregation granularity matters (like, for example taking the maximum) the results can change depending on the aggregation granularity. This will be demonstrated below. More examples about the impact of the aggregation granularity on the resulting values of measures can be found in this blog
Prerequisites for pruning of join columns (having an effect of the optimize join columns flag):
Pruning of the join column can only occur if all of the following holds:
- join field is not requested by query
- only fields from one join partner are requested
- join is an outer join, referential, or text join
- the cardinality to the join partner of which no fields are requested is 1
Explanation of prerequisites
Here are some explanations why these prerequisites are necessary for he optimize join columns flag to work:
1.) If the join field was requested by the query it obviously cannot be pruned away
2.) If fields are requested from both join partners the join has to be executed and therefore the join field has to be requested
3.) If the join was not an outer, referential or text join the join might “add” or “delete” records. This means that execution of the join would be necessary to determine the correct number of records. As a consequence of join execution the join column would be requested.
As an illustration of 3.) the following example shows the difference between inner and outer joins in terms of preserving the number of records:
Table 1 has the following values in its join column:
create column table leftTable (joinField NVARCHAR(10), measure Int); insert into leftTable values ('a',10); insert into leftTable values ('b',20); insert into leftTable values ('c',30);
Table 2 has the following values in its join field:
create column table rightTable (joinField NVARCHAR(10)); insert into rightTable values ('a'); insert into rightTable values ('a'); insert into rightTable values ('a'); insert into rightTable values ('b');
a) inner join: only fields will be retrieved that have a matching partner
the output will have values: a,a,a,b from Table 1
select l.joinfield "leftJoinfield", l.measure "measure", r.joinfield "rightJoinfield" from leftTable l inner join rightTable r on l.joinField=r.joinField
This will lead to the following result:
Note that the record with value “a” from Table 1 will be tripled because there are three matching entries in Table 2. Entry “c” will be omitted because there are no matching entries in Table 2.
This means with an inner join the number of output records can change depending on whether or not the join is executed. In this example the change in number of output records is from 3 values in the left table to 4 but it could be more or less depending on the right table’s entries.
Let’s have a look at an outer join in contrast
b) outer join: also fields without a matching entry in the right table will be retrieved
select l.joinfield "leftJoinfield", l.measure "measure", r.joinfield "rightJoinfield" from leftTable l left outer join rightTable r on l.joinField=r.joinField
You can see in the screenshot above that entry “c” is not omitted even though there is no match on the right table. In this sense outer joins are preserving the number of output records. However, you also see that entry “a” is tripled again. This means that the number of output records can change if more than one matching entry exists in the right table. This is why having the correct cardinality is vital (prerequisite 4 above). If you set the cardinality for example to n..1 and there is indeed at most one match in the right table no doubling of entries can occur by executing the join because every entry in the left table has at most 1 entry in the right table. If there is no entry in the right table that matches an entry in the left table this is not a problem: the value will still be kept as can be seen for entry “c” above.
This means: with a cardinality of 1 for the right table the execution of the join will not influence the number of records that are processed further and thus the join can be omitted if no fields have to be read from the right table.
Keep in mind: if you set a wrong cardinality the output might be wrong because pruning of the join might occur even though the join would influence the number of records for further processing. The optimizer trusts your cardinality setting!
4.) As discussed under point 3: If the to-be-pruned table might deliver more than one matching partner join pruning should not occur. Therefore the cardinality has to be set to 1 for the table of which no columns are requested. However, whether more than one matching partner can occur is not checked at runtime. To repeat it: the optimizer trusts your cardinality setting. Therefore, you have to make sure that your setting of “1” reflects the truth.
An example of the impact of the optimize join columns flag on query processing can be found here. In the example a simple left-outer join with n..1 cardinality will be modeled once with the optimize join columns flag set and once without this flag being set. An aggregation function will be used for which the granularity of aggregation matters. This will be used to demonstrate how the setting of the optimize join columns flag can change the resulting values of measures. Subsequently, feature “Debug this view” will be used to check that indeed join pruning occurs and the intermediate aggregation granularity changes if the flag is set.
The optimize join columns flag can lead to a large reduction in number of records that have to be processed at later stages of query processing and thus lead to better performance and reduced resource consumption. This is a consequence of the fact that the flag might allow aggregation to occur at an earlier timepoint thus reducing the number of records early. This is especially relevant in scenarios in which a lot of joins on different columns are defined but columns are requested only from a few join partners. In this situation taking the join column out of the aggregation granularity might allow to reduce the number of records that are processed further dramatically.
However, changing the granularity with respect to the join column as a function of the requested columns in the query comes with a side effect: query results might change for measures that use aggregation functions that are sensitive to the aggregation granularity at which they are calculated. SAP Note 2618790 can further help to understand these aggregation sensitive calculations. A good understanding of the behavior of this flag is essential to benefit from its performance increase without sacrificing correctness.