Technical Articles
Join cardinality setting in Calculation Views
Mechanism of join cardinality setting
“cardinality” is a setting that can be applied to joins in calculation views. It specifies how many matching entries for entries of one table exist in the other table of a join. It consist of two numbers, the left number describes the number of matching entries for entries of the right table while the right number describes the number of matching entries for entries of the left table. As an example, assume a join on field “employee” between Table 1 (left table) and Table 2 (right table). A join cardinality of “1..n” specifies that every entry inTable 2 has at most 1 matching entry in Table 1. Conversely, each entry in Table 1 might have 0 to n matching entries in Table 2. The symbol “n” stands here for an arbitrary positive number. For example, entry “Alice” in Table 1 might have 0, 1, or an arbitrary number of matches in Table 2. Take another example “1..1”. This indicates that each entry in Table 1, e.g., entry “Alice” of Table 1 has 0, or 1 matching entry in Table 2. Analogously, “Alice” of Table 2 has also at most 1 match in Table 1.
The cardinality setting is used by the optimizer to decide based on the requested query fields whether a join has to be executed or can be omitted without sacrificing correctness of the data. A join can be omitted if executing the join could neither add records nor reduce records and no fields are requested from the to-be-omitted table.
While inner joins can add (multiple matching entries in the other table) and remove (no matching entry – “..1” includes 0) records, outer joins can only add records. Therefore, guaranteeing with join cardinality that the to-be-pruned table has at most one matching item allows join pruning to occur for outer joins. Text joins behave similar to left-outer joins in this respect. In case of referential joins pruning can only happen if the referential integrity is set for the not-to-be-pruned table. Before HANA SPS03 the referential integrity was placed on the left table. Since SAP Web IDE for SAP HANA SPS03 the referential integrity can be placed on the left, the right, or both tables.
One exception to the rule that the to-be pruned table needs a setting of “..1” is if only measures with aggregation mode “count distinct” are requested. In this case potentially existing repeated values from the to-be-pruned tables will be made unique again by the count distinct calculation. As a consequence join execution will not change the results of the count distinct measure even if a “n..m” cardinality holds.
This leads to the following pre-requisits that all have to hold for join-pruning to happen:
- no field is requested from the to-be-pruned table
- the join type is outer, referential with integrity on the not-to-be pruned table, or text join and the to-be-pruned table contains the language column
- the join cardinality is either “..1” for the to-be-pruned table or only measures with count distinct aggregation or no measures at all are requested.
If tables are only directly involved in the join cardinality proposals can be obtained by the modeling tool. These suggestions are based on the data cardinality as of the time of the proposal. These proposals are not available if the join includes further nodes (e.g., the table coming in by a projection node).
By whatever means the cardinality setting is achieved – by proposal or manual setting – the optimizer will rely on these values when deciding about join pruning. There is no runtime check! If the setting pretends a lower cardinality than the actual cardinality of the data (e.g. it is set to “n..1” but actually “n..m” holds) omitting the join might lead to changes of the measure values compared to when the join is executed. Examples will follow later to illustrate this.
The hint described in SAP Note 2737941 can help to detect a wrong setting of join cardinalities based on the current data.
Purpose of join cardinality setting
By relying on the specified join cardinality the optimizer decides based on the requested fields whether a join has to be executed. Pruning away joins in models that have many joins defined can lead to significant performance boosts and reductions in temporary memory consumption. Imagine a defined star join involving 100 or more tables via further join chains. If no dimensional values are requested and the cardinality setting is “n..1” or “1..1” all joins can be omitted and the query reduces to one single table access to the central table instead of following all the potential join cascades.
As should have come clear by now the improved performance and resource consumption that can be achieved by join pruning bears a risk of inconsistent data when the cardinality setting does not reflect the true data cardinality. Therefore, a good understanding of join cardinality is vital which the following examples hopefully provide.
All examples below are based on two tables and a join between these two tables. In some examples additional projection or aggregation nodes will be added. Other than that the examples are kept pretty simple to illustrate the mechanisms at work. Obviously, performance gains in such simple models cannot be observed even though they exist. The simple models were chosen because the actual effects in large models will be more complex to understand and additional modelling choices can play a role there.
Simply reading the descriptions of the examples should be enough for a good understanding but it should also be possible to build the examples yourself based on the descriptions provided.
These are the two tables that will be used throughout the examples:
sales order table used throughout examples
employees table used throughout examples
As you can see there is a “1..1” relationship between field “employee” in table “salesOrders” and field “employee” in table “employees”. In contrast, there is a “1..n” relationship between field “employee” from table “salesOrders” and field “manager” in table “employees”.
Examples
The following examples will show
- how proposals for join cardinality can be obtained
- the impact of join cardinality on join pruning decisions
- potential reasons why specified join cardinalities might deviate from the data cardinality and the impact this can have on the resulting values of measures.
Summary of examples
These examples should have demonstrated that it is important to have a good understanding of how the cardinality setting might influence the outcome of queries and that it is important to guarantee that cardinalities are set correctly, if they are set. With a correct setting pruning of joins can have a huge positive impact on the runtime. With a wrong setting you pay with unexpected results for the performance. Therefore, use this setting but ensure that it will also reflect the true data cardinality in the future. Remember to adjust the cardinality setting if remodeling or data loading change the data cardinality. Ideally, your loading process guarantees the cardinality.
Dear Jan Zwickel,
thank you for this very useful reminder.
I would like to add a point about the various notation of cardinality.
For example :
Amaury
Thank you Amaury for pointing out the various options of cardinality notations. In the text I've sticked to the notation that is used in the SAP HANA Calculation View modeling user interface.
Best,
Jan
Good morning Jan Zwickel
Thank you for your kind answer.
Two more questions :
A) On a very large data set, without information about these dataset,
how do you determine the cardinality and the fields to be involve in the join ? (I mean, the lowest level).
Is there a data preparation tool in HANA Studio (something different from Table definition and Table content for manual analysis).
b) I've read the following blog post about proposals. However, it doesn't seem to work in HANA Studio. How do I set it up ?
You wrote : "The reason is that a proposal is only available when directly joining tables. In this example one table enters via a projection though". What are the steps and in which order to perform it ?
c) Which perspective do you use in Eclipse to enable "Debug" and "Database Explorer" button for the view ?
Thank you
Amaury
Hi Amaury,
a) Which fields you use for your join will certainly depend on the logic that you want to achieve. The cardinality is optimally guaranteed by your loading process. For example, during loading of a transactional table you might guarantee that there is only one matching entry in the master data table. Another option is that you can gurantee logically that there can be only one matching entry but this depends on business logic knowledge.
To get a quick insight into the current situation you could do an evaluation of the number of entries with identical values in a column:
Assuming that "SalesOrderId" is joined it would give you the number of entries with the same values in this join column. You could make this more specific by filtering on values from the other table or including additional join fields. However, whatever you get out here is only a static picture and might change with later loading. Therefore, having a logical reason why there is only one matching entry is the better solution
b) see the description in the Studio Modeling documentation
c) is addressed here.
Best,
Jan
Hello Jan Zwickel
Lets consider situation then we have Dimension Calculation View and we joining text fileds.
For example tables /BI0/MMATERIAL and /BIC/TZMATGR1.
We use Left Outer Join from /BI0/MMATERIAL to /BIC/TZMATGR1 because each material has material group and we want to show description for material group.
What kind of cardinality should we use - N:1 or 1:1 ?
Hi Yaroslav Lukin,
the cardinality setting should reflect the true data relationship. If I understand your description correctly, the same material group appears multiple times on your left-hand side while each material group matches only one entry on the right-hand side. Given this understanding you would have (N:1).
Is there a particular reason, why you are not using text-joins?
Best,
Jan
Hi Yaroslav Lukin,
Thanks for your blog.
We are working on SAP BW 7.5 SP 11.
We are starting to investigate on the HANA decision modelling to optimise our existing BW dataflow.
I have a question, can we manage everything requirements only with the calculation view ?
What would be the specific needs that we could not managed with calculation view and for which we would need attributes view or analytical view ?
What are the things that we coud only do in attributes or analytical view that we could not do in calculation views ?
Many Thanks
DD
Hi DD,
Attribute and Analytic Views have been deprecated (see e.g., here) and the way forward are Calculation Views. There might be some smaller differences in how you achieve a certain functionality but you should be able to achieve the same (and more functionality) with Calculation Views.
Best,
Jan
Perfect thanks a lot !
Nice tutorial & great explanations. Thank you for this!
Hello Jan Zwickel,
If we join Header table and Item table, lets suppose left outer join between Delivery header table and Delivery item table where header table is at the left side.
what should be the Cardinality here? 1:N or N:1?
Thanks,
Sakshi Srivastava
Hi Sakshi,
I would assume that the following relation holds in your data:
every entry in your header table has 0 to n matching entries in the item table. This means X:N
If an entry in your item table has at most one matching entry in your header table "X" would become "1", so 1:N
If there could be more matches in your header table for an entry in your item table, it would be N:M
Best,
Jan