Skip to Content
Technical Articles

Introduction of Join Cardinalities In SQL With SAP HANA 2.0 SPS04

Introduction

SAP HANA features like Calculation Views and CDS allow to specify the cardinality for joins to improve the performance of the execution. The cardinality specifies the number of rows which are matching another table if these tables are joined. With SPS04, this feature is also possible in plain SQL.

The cardinality can be used to create more optimized plans by proving the SQL optimizer more information how different tables are connected.

Prior SPS04:

SELECT * FROM TAB_A JOIN TAB_B ON TAB_A.ID = TAB_B.ID;

Since SPS04:

SELECT * FROM TAB_A ONE TO MANY JOIN TAB_B ON TAB_A ID = TAB_B.ID;

Syntax

The join section of a select statement contains now another optional part – the cardinality specification.

After the optionally join type, the optionally cardinality specification follows, which is unspecified by default. After this, the already known JOIN keyword follows as well as the table and the join condition.

join = [ <join_type> ] [ <join_cardinality> ] JOIN <table> ON <predicate>

<join_type> = INNER | LEFT | RIGHT | FULL
<join_cardinality> = <cardinality> TO <cardinality>
<cardinality> = MANY | [EXACT] ONE

The cardinality is a product of two parts: the cardinality of the left side and of the right side. These are the possible cardinalities, which can be combined in any way:

Cardinality Expected amount
ONE 0, 1
EXACT ONE 1
MANY 0 – infinite

Usage

In order to show example of the cardinality feature, initially two base tables are created and populated with sample data. The first table contains sales data, the second one employees information. The two tables are connected via the employee ID in the sales order table and the ID in the employees table.

CREATE COLUMN TABLE SALES_ORDER (ID INT, employeeID INT, amount INT);
INSERT INTO SALES_ORDER VALUES(1, 2, 20);
INSERT INTO SALES_ORDER VALUES(2, 1, 20);
INSERT INTO SALES_ORDER VALUES(3, 4, 20);
INSERT INTO SALES_ORDER VALUES(4, 7, 20);
INSERT INTO SALES_ORDER VALUES(5, 7, 20);

CREATE COLUMN TABLE EMPLOYEES (ID INT, name NVARCHAR(500), managerID INT);
INSERT INTO EMPLOYEES VALUES(1, 'Alice', NULL);
INSERT INTO EMPLOYEES VALUES(2, 'Donald', 1);
INSERT INTO EMPLOYEES VALUES(3, 'Dagobert', 1);
INSERT INTO EMPLOYEES VALUES(4, 'Hans', 1);
INSERT INTO EMPLOYEES VALUES(5, 'Heinz', 4);
INSERT INTO EMPLOYEES VALUES(6, 'Jane', 4);
INSERT INTO EMPLOYEES VALUES(7, 'Horst', NULL);

To select all the employees with their sales orders, the following SQL can be used:

SELECT E.NAME, S.ID salesID, S.amount FROM EMPLOYEES E INNER JOIN SALES_ORDER S ON S.employeeID = E.ID;

This statement returns:

NAME SALESID AMOUNT
Alice 2 20
Donald 1 20
Hans 3 20
Horst 4 20
Horst 5 20

As described above, each employee can create zero, one or multipel sales orders, so the cardinality is one to many. Therefore it can be used inside the statement to give the SQL optimizer more information, which can therefore improve the performance:

SELECT E.NAME, S.ID salesID, S.amount FROM EMPLOYEES E INNER ONE TO MANY JOIN SALES_ORDER S ON S.employeeID = E.ID;

Join Pruning

The specification of the cardinality can improve the performance, because a more optimized plan can be created. Consequently, this can lead to join pruning (the execution of a join is skipped), if certain conditions are met. The functionality is described in the following example.

To combine the data of the sample tables in a more meaningful way, a view can be created:

CREATE VIEW SALES_DETAILS AS 
	SELECT E.ID as employeeID, E.name as employeeName, M.ID as managerID, M.name as managerName, S.ID as salesID, S.amount as salesAmount
	FROM EMPLOYEES E 
	INNER JOIN SALES_ORDER S ON S.employeeID = E.ID
	LEFT OUTER JOIN EMPLOYEES M ON E.managerID = M.ID;

A select statement on this view, might only use a subset of the available columns, e.g.:

SELECT employeeID, salesAmount FROM SALES_DETAILS;

Even if only columns of two of the three data sources are requested, all joins are executed, as visible in the explain plan:

OPERATOR_NAME TABLE_NAME TABLE_TYPE
PROJECT ? ?
  HASH JOIN (LEFT OUTER) ? ?
    HASH JOIN ? ?
      COLUMN TABLE EMPLOYEES COLUMN TABLE
      COLUMN TABLE SALES_ORDER COLUMN TABLE
    COLUMN TABLE EMPLOYEES COLUMN TABLE

To provide the SQL optimizer more details, it’s possible to use the cardinalities in the CREATE VIEW statement:

CREATE VIEW SALES_DETAILS AS 
	SELECT E.ID as employeeID, E.name as employeeName, M.ID as managerID, M.name as managerName, S.ID as salesID, S.amount as salesAmount
	FROM EMPLOYEES E 
	INNER ONE TO MANY JOIN SALES_ORDER S ON S.employeeID = E.ID
	LEFT OUTER ONE TO ONE JOIN EMPLOYEES M ON E.managerID = M.ID;

Now, the same select as above returns a more optimized plan with only one join:

OPERATOR_NAME TABLE_NAME TABLE_TYPE
PROJECT ? ?
  HASH JOIN
    COLUMN TABLE EMPLOYEES COLUMN TABLE
    COLUMN TABLE SALES_ORDER COLUMN TABLE

This is possible, since the optimizer knows with the cardinality specification, that the self join with the employees table is not needed for the execution. Because corresponding columns are not queried and the cardinality in combination with the join type specifies, that the result set will not change, if the join in pruned, the optimizer can remove the join.

Unexpected Results

Wrong cardinality information can lead to unexpected results, because the optimizer can create different plans depending on the information.

The following query gets all employees and their manager’s name. If the person has no manager, null is used for the manager name:

SELECT M.name as managerName, E.name FROM EMPLOYEES E LEFT OUTER ONE TO ONE JOIN EMPLOYEES M ON E.managerID = M.ID;

This query produces this result set:

MANAGERNAME NAME
Alice Donald
Alice Dagobert
Alice Hans
Hans Heinz
Hans Jane
? Alice
? Horst

It the cardinality is changed to ONE TO EXACT ONE (which is wrong), the result set changes to:

MANAGERNAME NAME
Alice Donald
Alice Dagobert
Alice Hans
Hans Heinz
Hans Jane

The specification of the cardinality is powerful feature, which can improve the performance, but if wrongly specified it can lead to unexpected results.

Hints

The following hints can be used in the context of join cardinalities:

JOIN_REMOVAL_USING_CARDINALITY: Tells the optimizer to prefer join removal (join pruning) if possible

NO_JOIN_REMOVAL_USING_CARDINALITY: Tells the optimizer not to prefer join removal (join pruning). This hint can be used to troubleshoot unexpected results.

CHECK_JOIN_CARDINALITY: Creates a special plan, in which each join is checked for cardinality violation and throws an exception when it occurs.

References

SAP HANA SQL and System Views Reference for SAP HANA Platform – SELECT Statement

SAP HANA SQL and System Views Reference for SAP HANA Platform – HINT Details

Be the first to leave a comment
You must be Logged on to comment or reply to a post.