SAP HANA Engine Optimization for bizy geeks: Episode #4: Joining the Union
I dedicate this small episode to the SAP HANA community in the North-West of Europe, seen from my own geographical perspective in Heidelberg. One of their member colleagues suggested that I write a Blog about the different types of Joins in SAP HANA, and I am happy to start fulfilling this wish today. Hope it helps a bit.
If you take a look at the SAP HANA SQL documentation in help.sap.com -> Hana Appliance, you will see that the SAP HANA Database supports the same type of Joins and Unions as other relational databases. In particular, it supports INNER JOIN, LEFT/RIGHT OUTER JOIN, FULL OUTER JOIN and CROSS JOIN. Unions are not Joins, but they somehow fit into the picture, and certain type of Joins can be emulated via Unions anyway. No surprises… yet…
I don’t mean to give you an academic, stiff upper lip disquisition on the meaning of the different types of Joins, Unions and the like. You may know all this better than I do. Otherwise, you can ask standard database experts, use google, or take a look at the nice Wikipedia definitions and examples.
My intention is rather to describe, and give you examples of SAP HANA specific Join and Union possibilities.
Today I will start with a slippery kind of Join I met for the first time as part of the SAP HANA Studio content modeling tools, and which does not belong to the standard SAP HANA SQL syntax. It is rather an SAP HANA Join Engine optimization feature.
I am talking about the (in)famous Referential Join. Poor guy… I am calling it names all the time: Slippery, (in)famous, slithery, cryptic… The first time I heard about Referential Joins with SAP HANA, my spontaneous reaction was to google the term. I was a bit confused when I did not find any satisfactory answer on the Web. A while later, I found out that a Referential Join is an SAP HANA specific type of join. It is semantically an Inner Join that assumes that referential integrity is given. A standard Inner Join returns rows when there is at least one match in both joined tables (left and right tables). Referential integrity is given if it is ensured than any record on the left table has a corresponding match on the right table, and the other way around.
And now I will pull a rabbit out of the HANA hat! Referential Join offers an optimized, faster Inner Join, where the right table is not checked if no field from the right table is requested! Hooray!
Here is an example where using a Referential Join can be faster than using a standard Inner Join. Let’s take a typical Master Data model, with several associated tables for your customer or business partner data.
We can use the Enterprise Procurement Management toy data model in SAP NetWeaver for illustration purposes. To see all EPM tables and definitions, you can use the SE16 transaction in SAP NetWeaver, and search for all tables starting with SNWD*.
The table SNWD_BPA contains main Business Partner data, such as a Business Partner ID (the exact field is called “NODE_KEY”), a Business Partner Role (Customer or Supplier), the Company name, the Email address of the main contact at partner site, etc. It does not contain the exact First Name, Middle Name, Last Name or *** of the contact person, though. This kind of additional information is included in a separated, related table: SNWD_BPA_CONTACT.
I have taken this example because of an, unfortunately, relatively common marketing and sales practice which drives me crazy. It consists of sending marketing related letters and Emails to people, while addressing them by the wrong gender! As a woman, and Managing Director of an IT company, it happens to me all the time that companies send me marketing letters and Emails using “Dear Mr. Durany”. It drives me crazy, because it would take very little effort to check the right gender. You don’t even need to INNER JOIN anything… the Web usually offers quite good approximations, but it seems that some companies do not respect the potential customers they mean to acquire, and don’t even bother to perform the slightest politeness checks. They are maybe in a hurry, or might act under certain heuristic assumptions, like, e.g. “If you send a letter to the Managing Director of a company, don’t worry about genders, it will be a man with high probability”.
I cannot scientifically prove yet if this is the rationale behind such stupid errors. Maybe it is only pure carelessness. But it bothers me either way. Maybe I’ll write a mini R application in order to check statistical correlations around such mistakes for different countries and industry branches…
Coming back to our Referential Join, if you have an accurate Business Partner data model, like the one above, you can avoid gender mistakes by creating an SAP HANA content model which joins the SNWD_BPA and SNWD_BPA_CONTACT tables by their corresponding Primary Key (“NODE_KEY” field) and Foreign Key (“PARENT_KEY” field). If you are certain of the referential integrity of the data in both tables, the best type of join can be a Referential Join, available in the Modeling tools of the SAP HANA Studio Content (remember the SAP HANA Attribute Views, Analytics Views and Calculation Views we talked about some Blogs, and months, below). The Referential Join acts as an Inner Join, with the additional and cute advantage that if you perform a query on the content model in question, and this query does not request any data from the SNWD_BPA_CONTACT table, then the query will return faster results with a Referential Join than with a standard SQL Inner Join.
In particular, if you want to query the amount of Business Partners from the ***-joined model designed above, you could use the following SQL statement:
SELECT COUNT(“NODE_KEY”) FROM
Since in this case no field of the SNWD_BPA_CONTACT table is requested, this particular query will be faster than if you had used a standard Inner Join for the content model.
You may want to perform your own benchmarks, and compare performance results yourself. The SAP HANA Join Engine optimizes results in different ways. The Referential Join is not the only example of optimization possibilities. It is also not the only SAP HANA specific type of Join. Regarding Union there are also some performance tricks available using SAP HANA Calculation Engine Operators. I will give you further examples some other time.
Now I am hungry and would like to have late lunch somewhere nice. The weather today is perfect to sit outside, at least in Heidelberg.
Have a nice weekend!
Founder and Managing Director
and Co-Founder of Glooobal GmbH