Mechanism of join cardinality setting
“cardinality” is a setting that can be applied to joins in calculation views. It specifies how many matching entries for entries of one table exist in the other table of a join. It consist of two numbers, the left number describes the number of matching entries for entries of the right table while the right number describes the number of matching entries for entries of the left table. As an example, assume a join on field “employee” between Table 1 (left table) and Table 2 (right table). A join cardinality of “1..n” specifies that Table 2 has at most 1 matching entry in Table 1. Conversely, each entry in Table 1 might have 0 to n matching entries in Table 2. The symbol “n” stands here for an arbitrary positive number. For example, entry “Alice” in Table 1 might have 0, 1, or an arbitrary number of matches in Table 2. Take another example “1..1”. This indicates that each entry in Table 1, e.g., entry “Alice” of Table 1 has 0, or 1 matching entry in Table 2. Analogously, “Alice” of Table 2 has also at most 1 match in Table 1.
The cardinality setting is used by the optimizer to decide based on the requested query fields whether a join has to be executed or can be omitted without sacrificing correctness of the data. A join can be omitted if executing the join could neither add records nor reduce records and no fields are requested from the to-be-omitted table.
While inner joins can add (multiple matching entries in the other table) and remove (no matching entry – “..1” includes 0) records, outer joins can only add records. Therefore, guaranteeing with join cardinality that the to-be-pruned table has at most one matching item allows join pruning to occur for outer joins. As referential and text joins guarantee a match also they can be pruned away as long as a cardinality setting of “1..1” guarantees that no records are added by the join.
One exception to the rule that the to-be pruned table needs a setting of “..1” is if only measures with aggregation mode “count distinct” are requested. In this case potentially existing repeated values from the to-be-pruned tables will be made unique again by the count distinct calculation. As a consequence join execution will not change the results of the count distinct measure even if a “n..m” cardinality holds.
This leads to the following pre-requisits that all have to hold for join-pruning to happen:
- no field is requested from the to-be-pruned table
- the join type is outer, referential, or text
- the join cardinality is either “..1” for the to-be-pruned table or only measures with count distinct aggregation or no measures at all are requested.
If tables are only directly involved in the join cardinality proposals can be obtained by the modeling tool. These suggestions are based on the data cardinality as of the time of the proposal. These proposals are not available if the join includes further nodes (e.g., the table coming in by a projection node).
By whatever means the cardinality setting is achieved – by proposal or manual setting – the optimizer will rely on these values when deciding about join pruning. There is no runtime check! If the setting pretends a lower cardinality than the actual cardinality of the data (e.g. it is set to “n..1” but actually “n..m” holds) omitting the join might lead to changes of the measure values compared to when the join is executed. Examples will follow later to illustrate this.
Purpose of join cardinality setting
By relying on the specified join cardinality the optimizer decides based on the requested fields whether a join has to be executed. Pruning away joins in models that have many joins defined can lead to significant performance boosts and reductions in temporary memory consumption. Imagine a defined star join involving 100 or more tables via further join chains. If no dimensional values are requested and the cardinality setting is “n..1” or “1..1” all joins can be omitted and the query reduces to one single table access to the central table instead of following all the potential join cascades.
As should have come clear by now the improved performance and resource consumption that can be achieved by join pruning bears a risk of inconsistent data when the cardinality setting does not reflect the true data cardinality. Therefore, a good understanding of join cardinality is vital which the following examples hopefully provide.
All examples below are based on two tables and a join between these two tables. In some examples additional projection or aggregation nodes will be added. Other than that the examples are kept pretty simple to illustrate the mechanisms at work. Obviously, performance gains in such simple models cannot be observed even though they exist. The simple models were chosen because the actual effects in large models will be more complex to understand and additional modelling choices can play a role there.
Simply reading the descriptions of the examples should be enough for a good understanding but it should also be possible to build the examples yourself based on the descriptions provided.
These are the two tables that will be used throughout the examples:
sales order table used throughout examples
employees table used throughout examples
As you can see there is a “1..1” relationship between field “employee” in table “salesOrders” and field “employee” in table “employees”. In contrast, there is a “1..n” relationship between field “employee” from table “salesOrders” and field “manager” in table “employees”.
The following examples will show
- how proposals for join cardinality can be obtained
- the impact of join cardinality on join pruning decisions
- potential reasons why specified join cardinalities might deviate from the data cardinality and the impact this can have on the resulting values of measures.
Summary of examples
These examples should have demonstrated that it is important to have a good understanding of how the cardinality setting might influence the outcome of queries and that it is important to guarantee that cardinalities are set correctly, if they are set. With a correct setting pruning of joins can have a huge positive impact on the runtime. With a wrong setting you pay with unexpected results for the performance. Therefore, use this setting but ensure that it will also reflect the true data cardinality in the future. Remember to adjust the cardinality setting if remodeling or data loading change the data cardinality. Ideally, your loading process guarantees the cardinality.