Join cardinality setting in Calculation Views: how proposals for join cardinality can be obtained
This example is referenced in my other post about the usage of the join cardinality settings in SAP HANA Calculation Views. Please have a look at the other post to get a better understanding of the context for this example
This example will demonstrate how to get a proposal for the join cardinality by the modeling tool. The proposal is based on the data cardinality that holds for the join fields between the tables at the time of the proposal. Prerequisite is that the join is directly based on tables. At the end it will be discussed with which SQL statements you could derive the join cardinality that currently holds in the data also when intermediate nodes are feeding into the join.
Let’s start with a Calculation View that only includes a join of the two tables “salesOrders” and “employees” (the model and tables are discussed in more details here). Join the two tables with a left outer join on field “employee”, table “salesOrders” being the left table:
Requesting join cardinality proposals. Button for requesting proposals is shown in yellow
As described above, table “salesOrders” has only distinct values in column “employee”. Similarly, column “employee” in table “employees” also has no repeated values. Therefore, if you wanted to set the cardinality for a join on the fields “employee” you would set it to “1..1”. As long as only tables are involved in the join you can ask the modeling tool for proposals of cardinality. Here is, how you could do it:
Click on the button marked in yellow in the figure above to get an proposal of the join cardinality. Based on the cardinality of the field “employee” in the tables “1..1” will be proposed. For the sake of demonstration remove the join and create a new left-outer join. This time between “employee” of the left table and “manager” of the right table. As there are multiple duplicates in column “manager” (e.g., “Alice”) the proposal by the modeling tool is “1..n”.
This means: when you include only tables in your join the modeler will propose a cardinality for the join field based on the current data cardinality. The following is a demonstration that no proposals are available if the join includes other nodes.
Create another view. This time do not add the table “employees” directly into the join node but place it first in a projection:
Request for join cardinality proposals is failing because another node is involved in the join.
Now you will receive no proposal for the join cardinality. The reason is that a proposal is only available when directly joining tables. In this example one table enters via a projection though.
To sum up, join cardinality proposals are only available if tables are directly involved in the join and not entering via other nodes.
SQL “HANA_Data_JoinCardinalities_CommandGenerator” attached to SAP Note 1969700 allows you to determine the join cardinality that currently holds also for joins on views. To do so, you would replace the object names in the section that is labeled “/* Modification section */” with the names of the objects that you want to join and replace the join columns on which you want to join on. Running the SQL statement will generate a SQL command that delivers the frequency of matches in the right input object for each entry in the left input. If your right input object includes also an aggregation you need to specify the maximum set of columns that can be used for aggregation at runtime. This is necessary, as the cardinality depends on the aggregation granularity and you need to determine the highest frequency of matches.
As an example, with the following modification section a SQL statement would be generated that returns the frequency of matches in object “STATISTICS_ALERTS_BASE” for each entry in object “STATISTICS_CURRENT_ALERTS” when joined on column “ALERT_ID” and aggregating by “ALERT_RATING”:
SELECT /* Modification section */ '_SYS_STATISTICS' SCHEMA_NAME_1, 'STATISTICS_CURRENT_ALERTS' OBJECT_NAME_1, 'ALERT_ID' JOIN_COLUMNS_1, '_SYS_STATISTICS' SCHEMA_NAME_2, 'STATISTICS_ALERTS_BASE' OBJECT_NAME_2, 'ALERT_ID' JOIN_COLUMNS_2, 'ALERT_RATING' ATTRIBUTE_COLUMNS_2 FROM DUMMY
If the result reports a higher frequency than 1 for any column, the cardinality on the right side would be “n” (or “m”). A violation of a referential join condition could be detected by null values in the right join columns or frequencies that are higher than 1.
To identify the object names that correspond to the nodes that are feeding into your join you can do a data preview on the respective feeding nodes and look at the generated SQL statement. The data preview will generate these views and thus needs to be run to make these views available. SAP Note 2853770 discusses how these views can also be generated via SQL so that you are not dependent on the data preview.
This example is referenced in my other post about the usage of the join cardinality settings in SAP HANA Calculation Views. Please have a look at the other post to get a better understanding of the context for this example.