Technical Articles
Introduction of Join Cardinalities In SQL With SAP HANA 2.0 SPS04
Introduction
SAP HANA features like Calculation Views and CDS allow to specify the cardinality for joins to improve the performance of the execution. The cardinality specifies the number of rows which are matching another table if these tables are joined. With SPS04, this feature is also possible in plain SQL.
The cardinality can be used to create more optimized plans by proving the SQL optimizer more information how different tables are connected.
Prior SPS04:
SELECT * FROM TAB_A JOIN TAB_B ON TAB_A.ID = TAB_B.ID;
Since SPS04:
SELECT * FROM TAB_A ONE TO MANY JOIN TAB_B ON TAB_A ID = TAB_B.ID;
Syntax
The join section of a select statement contains now another optional part – the cardinality specification.
After the optionally join type, the optionally cardinality specification follows, which is unspecified by default. After this, the already known JOIN keyword follows as well as the table and the join condition.
join = [ <join_type> ] [ <join_cardinality> ] JOIN <table> ON <predicate>
<join_type> = INNER | LEFT | RIGHT | FULL
<join_cardinality> = <cardinality> TO <cardinality>
<cardinality> = MANY | [EXACT] ONE
The cardinality is a product of two parts: the cardinality of the left side and of the right side. These are the possible cardinalities, which can be combined in any way:
Cardinality | Expected amount |
ONE | 0, 1 |
EXACT ONE | 1 |
MANY | 0 – infinite |
Usage
In order to show example of the cardinality feature, initially two base tables are created and populated with sample data. The first table contains sales data, the second one employees information. The two tables are connected via the employee ID in the sales order table and the ID in the employees table.
CREATE COLUMN TABLE SALES_ORDER (ID INT, employeeID INT, amount INT);
INSERT INTO SALES_ORDER VALUES(1, 2, 20);
INSERT INTO SALES_ORDER VALUES(2, 1, 20);
INSERT INTO SALES_ORDER VALUES(3, 4, 20);
INSERT INTO SALES_ORDER VALUES(4, 7, 20);
INSERT INTO SALES_ORDER VALUES(5, 7, 20);
CREATE COLUMN TABLE EMPLOYEES (ID INT, name NVARCHAR(500), managerID INT);
INSERT INTO EMPLOYEES VALUES(1, 'Alice', NULL);
INSERT INTO EMPLOYEES VALUES(2, 'Donald', 1);
INSERT INTO EMPLOYEES VALUES(3, 'Dagobert', 1);
INSERT INTO EMPLOYEES VALUES(4, 'Hans', 1);
INSERT INTO EMPLOYEES VALUES(5, 'Heinz', 4);
INSERT INTO EMPLOYEES VALUES(6, 'Jane', 4);
INSERT INTO EMPLOYEES VALUES(7, 'Horst', NULL);
To select all the employees with their sales orders, the following SQL can be used:
SELECT E.NAME, S.ID salesID, S.amount FROM EMPLOYEES E INNER JOIN SALES_ORDER S ON S.employeeID = E.ID;
This statement returns:
NAME | SALESID | AMOUNT |
Alice | 2 | 20 |
Donald | 1 | 20 |
Hans | 3 | 20 |
Horst | 4 | 20 |
Horst | 5 | 20 |
As described above, each employee can create zero, one or multipel sales orders, so the cardinality is one to many. Therefore it can be used inside the statement to give the SQL optimizer more information, which can therefore improve the performance:
SELECT E.NAME, S.ID salesID, S.amount FROM EMPLOYEES E INNER ONE TO MANY JOIN SALES_ORDER S ON S.employeeID = E.ID;
Join Pruning
The specification of the cardinality can improve the performance, because a more optimized plan can be created. Consequently, this can lead to join pruning (the execution of a join is skipped), if certain conditions are met. The functionality is described in the following example.
To combine the data of the sample tables in a more meaningful way, a view can be created:
CREATE VIEW SALES_DETAILS AS
SELECT E.ID as employeeID, E.name as employeeName, M.ID as managerID, M.name as managerName, S.ID as salesID, S.amount as salesAmount
FROM EMPLOYEES E
INNER JOIN SALES_ORDER S ON S.employeeID = E.ID
LEFT OUTER JOIN EMPLOYEES M ON E.managerID = M.ID;
A select statement on this view, might only use a subset of the available columns, e.g.:
SELECT employeeID, salesAmount FROM SALES_DETAILS;
Even if only columns of two of the three data sources are requested, all joins are executed, as visible in the explain plan:
OPERATOR_NAME | TABLE_NAME | TABLE_TYPE |
PROJECT | ? | ? |
HASH JOIN (LEFT OUTER) | ? | ? |
HASH JOIN | ? | ? |
COLUMN TABLE | EMPLOYEES | COLUMN TABLE |
COLUMN TABLE | SALES_ORDER | COLUMN TABLE |
COLUMN TABLE | EMPLOYEES | COLUMN TABLE |
To provide the SQL optimizer more details, it’s possible to use the cardinalities in the CREATE VIEW statement:
CREATE VIEW SALES_DETAILS AS
SELECT E.ID as employeeID, E.name as employeeName, M.ID as managerID, M.name as managerName, S.ID as salesID, S.amount as salesAmount
FROM EMPLOYEES E
INNER ONE TO MANY JOIN SALES_ORDER S ON S.employeeID = E.ID
LEFT OUTER ONE TO ONE JOIN EMPLOYEES M ON E.managerID = M.ID;
Now, the same select as above returns a more optimized plan with only one join:
OPERATOR_NAME | TABLE_NAME | TABLE_TYPE |
PROJECT | ? | ? |
HASH JOIN | ||
COLUMN TABLE | EMPLOYEES | COLUMN TABLE |
COLUMN TABLE | SALES_ORDER | COLUMN TABLE |
This is possible, since the optimizer knows with the cardinality specification, that the self join with the employees table is not needed for the execution. Because corresponding columns are not queried and the cardinality in combination with the join type specifies, that the result set will not change, if the join in pruned, the optimizer can remove the join.
Unexpected Results
Wrong cardinality information can lead to unexpected results, because the optimizer can create different plans depending on the information.
The following query gets all employees and their manager’s name. If the person has no manager, null is used for the manager name:
SELECT M.name as managerName, E.name FROM EMPLOYEES E LEFT OUTER ONE TO ONE JOIN EMPLOYEES M ON E.managerID = M.ID;
This query produces this result set:
MANAGERNAME | NAME |
Alice | Donald |
Alice | Dagobert |
Alice | Hans |
Hans | Heinz |
Hans | Jane |
? | Alice |
? | Horst |
It the cardinality is changed to ONE TO EXACT ONE (which is wrong), the result set changes to:
MANAGERNAME | NAME |
Alice | Donald |
Alice | Dagobert |
Alice | Hans |
Hans | Heinz |
Hans | Jane |
The specification of the cardinality is powerful feature, which can improve the performance, but if wrongly specified it can lead to unexpected results.
Hints
The following hints can be used in the context of join cardinalities:
JOIN_REMOVAL_USING_CARDINALITY: Tells the optimizer to prefer join removal (join pruning) if possible
NO_JOIN_REMOVAL_USING_CARDINALITY: Tells the optimizer not to prefer join removal (join pruning). This hint can be used to troubleshoot unexpected results.
CHECK_JOIN_CARDINALITY: Creates a special plan, in which each join is checked for cardinality violation and throws an exception when it occurs.
References
SAP HANA SQL and System Views Reference for SAP HANA Platform – SELECT Statement
SAP HANA SQL and System Views Reference for SAP HANA Platform – HINT Details