Remote Join Relocation and VARBINARY Blues

DotEiserman · ‎04-17-2023

Today's blog will be very niche, but like all things niche, it will find a reader. 🙂 I'll be discussing the impact of the VARBINARY data type when trying to do "Remote Join Relocation" with SDA (Smart Data Access) functionality.

In addition we'll see the following:

How to monitor your statements executed against the remote databases

How to see your remote source properties

The technical scenario is a SAP HANA Enterprise environment reading data from a SAP HANA database, the environment versions in my scenario are listed below:

HANA Enterprise Version	Version 2.00.059.05.1662044871 (fa/hana2sp05)
Remote HANA Database Version	Version 2.00.059.08.1678802465 (fa/hana2sp05)

If you're not familiar with the term SDA, it is short for Smart Data Access and it is SAP HANA functionality that allows you to read data on a remote database by means of an object called a virtual table. The virtual table is created with a name of your choice within the schema of your choice on the SAP HANA Enterprise environment and is a pointer to a specific table within specific schema on the remote source system. When you do a SELECT on the virtual table, it will log onto the remote database, perform the SELECT, and then pass the data back to you. If you would like to learn more about this functionality, click here.

SDA Data Flow

In this scenario, the remote source has been defined as using a HANA (ODBC) adapter:

Remote source to SAP Banking Services

The table that forms part of our scenario is BCA_PAYMITEM from the SAP Banking Services application, it has two columns that are part of the primary key that have been defined as data type RAW (length 16) on the ABAP layer, the corresponding data type on the physical database table is VARBINARY (length 16).

Note 1: From SAP Help, the VARBINARY data type is used to store binary data of a specified maximum length in bytes.

Table Definitions on ABAP (left) and DB Layer (right)

Side note: if the remote source is a different database technology for e.g. Oracle or MS SQL, then there would be a SDA data type mapping between the source DB and SAP HANA Enterprise, however in our example the remote source is a SAP HANA database and therefore the virtual table column data types match the remote table column data types, below you can see the virtual table definition also has the columns as data type VARBINARY.

Definition of virtual table on HANA Enterprise

To confirm this mapping you can call the GET_REMOTE_SOURCE_PROPERTIES procedure for your remote source and scroll to the MAPPING_SQL_* section, you'll see that VARBINARY is mapped to VARBINARY.

CALL GET_REMOTE_SOURCE_PROPERTIES('<your remote source name>',?)

Showing the remote source properties

Our first set of examples will have a look at how simple SELECT statements and WHERE clauses behave with a VARBINARY column.

Example 1 is a simple TOP 10 SELECT off the virtual table, the left hand side shows the SELECT statement I executed from SAP HANA Enterprise, and the right hand side shows a SELECT off table M_REMOTE_STATEMENTS (monitoring table you can use to see the SQL text of the SQL statements executed on the remote sources). This shows how my code was translated and executed across the HANA ODBC remote source.

Note 2: Both statements were executed on the local SAP HANA Enterprise environment.

SELECT TOP 10 * 

  FROM "<your schema name>"."BCA_PAYMITEM";



  SELECT * 

    FROM "M_REMOTE_STATEMENTS"

   WHERE "USER_NAME" = '<your username>'

ORDER BY "START_TIME";

Example 1: Doing simple Top 10 SELECT off virtual table

With example 2 we're going to see if we can retrieve a record back for a specific CONTRACT_INT and ITEM ID. What's interesting now is that a HEXTOBIN cast was inserted around my hardcoded values. Let's try understand why this happened.

Note 3: The function HEXTOBIN is a HANA cast function that converts a string of hexadecimal characters to a VARBINARY value.

SELECT TOP 10 * 

  FROM "<your schema name>"."BCA_PAYMITEM"    

 WHERE "CONTRACT_INT" = '00D4BEE40F1B1ED78294005056865389'

   AND "ITEM_ID"      = '0050568653891ED88E8103C1F9B90CEB';



  SELECT * 

    FROM "M_REMOTE_STATEMENTS"

   WHERE "USER_NAME" = '<your username>'

ORDER BY "START_TIME";

Example 2: Doing simple SELECT for specific CONTRACT_INT and ITEM_ID off the virtual table

We know that CONTRACT_INT and ITEM_ID are of data type VARBINARY, however you'll see that in my SELECT statement's WHERE clause, I used hardcoded character strings on the right hand side of the formula. This meant that I was asking for a comparison between a VARBINARY value and a character string value. HANA inserted the HEXTOBIN cast to make them the same data type, this is an example of implicit casting.

With example 3 we're going to see if we can retrieve a record back for specific CONTRACT_INT and ITEM ID but this time try passing hardcoded BINARY values, I've done this by prefixing the value with an X to indicate it is a BINARY constant. You'll see that HEXTOBIN cast is no longer there in the generated code.

SELECT TOP 10 * 

  FROM "<your schema name>"."BCA_PAYMITEM"    

 WHERE "CONTRACT_INT" = X'00D4BEE40F1B1ED78294005056865389'

   AND "ITEM_ID"      = X'0050568653891ED88E8103C1F9B90CEB';



  SELECT * 

    FROM "M_REMOTE_STATEMENTS"

   WHERE "USER_NAME" = '<your username>'

ORDER BY "START_TIME";

Example 3: Simple SELECT with hardcoded BINARY values

If you've made it this far, bravo for hanging in there, we're finally on Remote Join Relocation. Let's talk about what it is and why we need it.

Remote Join Relocation is a query optimization technique used by the SAP HANA database system to improve query performance. It is applicable in the scenario where the tables you are trying to join reside in different database systems, for e.g. remote database 1 and remote database 2 OR local HANA and remote database 1. These types of joins can be expensive in terms of performance as they involve data transfer, in other words, the data from both tables needs to be brought together in your local SAP HANA Enterprise before the join can proceed.

The Remote Join Relocation technique is used to move the join operation closer to the data source. Instead of transferring all the data over the network and then performing the join operation, the join operation is moved to the remote data source and only the relevant data is transferred over the network. It does this by creating a temporary table on the remote database, filling it with values, using it in the join, and then deleting it afterwards.

It is easy to see why this can greatly improve performance in certain scenarios, the problem is that remote join relocation is only supported for a certain set of data types, you guessed it, VARBINARY isn't one of them. This limitation is discussed in SAP Note "3159639 - SDA - Supported Data Types For Remote Join Push Down to HANA Remote DB", please click here to reference it. I've highlighted data type NVARCHAR in the note below as that is what we'll be using in our workaround.

SAP Note 3159639

Let's prove this in Example 4 by creating a local HANA table called "RemoteJoin_VARBINARY" with two columns of data type VARBINARY, it only contains 10 records. I'm going to try join this to BCA_PAYMITEM and see if I can relocate my join to the SAP Banking Services database.

Local HANA table with VARBINARY columns

    SELECT * 

      FROM "<your schema name>"."BCA_PAYMITEM" RemoteDB

INNER JOIN "<your schema name>"."RemoteJoin_VARBINARY" LocalDB    

        ON LocalDB."CONTRACT_INT" = RemoteDB."CONTRACT_INT"

       AND LocalDB."ITEM_ID"      = RemoteDB."ITEM_ID";



  SELECT * 

    FROM "M_REMOTE_STATEMENTS"

   WHERE "USER_NAME" = '<your username>'

ORDER BY "START_TIME";

As you can see my result did only return 10 records after the join completed, however if you look at the right hand side you'll see that it fetched 1 498 720 records from SAP Banking Services and transferred them to HANA Enterprise to perform the join.

Example 4: Remote Join Relocation Fail

Perhaps we try persuade it with a database hint asking it to perform Remote Join Relocation:

    SELECT * 

      FROM "<your schema name>"."BCA_PAYMITEM" RemoteDB

INNER JOIN "<your schema name>"."RemoteJoin_VARBINARY" LocalDB    

        ON LocalDB."CONTRACT_INT" = RemoteDB."CONTRACT_INT"

       AND LocalDB."ITEM_ID"      = RemoteDB."ITEM_ID"

 WITH HINT (REMOTE_JOIN_RELOCATION);



  SELECT * 

    FROM "M_REMOTE_STATEMENTS"

   WHERE "USER_NAME" = '<your username>'

ORDER BY "START_TIME";

Still no success, it fetched all 1 498 720 records again:

Example 5: Trying database hint persuasion

Now let's try this in Example 6 with a data type that is supported for Remote Join Relocation, I've created another local table called "RemoteJoin_NVARCHAR" but this time I used the data type NVARCHAR for the joining columns.

Local HANA table with NVARCHAR columns

Success!!

    SELECT * 

      FROM "<your schema name>"."BCA_PAYMITEM" RemoteDB

INNER JOIN "<your schema name>"."RemoteJoin_NVARCHAR" LocalDB    

        ON LocalDB."CONTRACT_INT" = RemoteDB."CONTRACT_INT"

       AND LocalDB."ITEM_ID"      = RemoteDB."ITEM_ID";



  SELECT * 

    FROM "M_REMOTE_STATEMENTS"

   WHERE "USER_NAME" = '<your username>'

ORDER BY "START_TIME";

Example 6: Remote Join Relocation success

If we look at the activity generated on the remote source we see the following actions:

Creation of the temporary table on remote HANA DB - CREATE LOCAL TEMPORARY COLUMN TABLE #JRT_0X00007F50EB0AF010_1 ( "C1" NVARCHAR(32) null, "C2" NVARCHAR(32) null, "C3" INTEGER null )

Populating it with the values from our local HANA table "RemoteJoin_NVARCHAR" - INSERT INTO #JRT_0X00007F50EB0AF010_1 VALUES(?, ?, ? )

The SELECT using the #JRT table created above

Clearing the values out of the temporary table on remote HANA DB - TRUNCATE TABLE #JRT_0X00007F50EB0AF010_1

Dropping the temporary table - DROP TABLE #JRT_0X00007F50EB0AF010_1

The only downside of this workaround is that we again have an implicit cast happening due to the different data types, and unfortunately this time it is BINTOHEX on the remote DB columns (pay attention to the table aliases), meaning that the remote table key columns (VARBINARY) are trying to match my temporary table key columns (NVARCHAR).

...FROM ("SAPABAP1"."BCA_PAYMITEM" "REMOTEDB" INNER JOIN (SELECT "C1", "C2", "C3" FROM #JRT_0X00007F48E79D6010_1 ) "REMOTEDB2" ON (BINTONHEX("REMOTEDB"."CONTRACT_INT") = "REMOTEDB2"."C1" AND BINTONHEX("REMOTEDB"."ITEM_ID") = "REMOTEDB2"."C2") )

This can be very expensive depending on the number of records in your remote table and in our case it is, it is casting 1 498 720 records in remote table to match to 10 records in temporary table.

Let's try help it with an explicit cast in our next example and doing the cast on the smaller table columns.

    SELECT * 

      FROM "<your remote schema>"."BCA_PAYMITEM" RemoteDB

INNER JOIN "<your remote schema>"."RemoteJoin_NVARCHAR" LocalDB    

        ON HEXTOBIN(LocalDB."CONTRACT_INT") = RemoteDB."CONTRACT_INT"

       AND HEXTOBIN(LocalDB."ITEM_ID")      = RemoteDB."ITEM_ID";



  SELECT * 

    FROM "M_REMOTE_STATEMENTS"

   WHERE "USER_NAME" = '<your username>'

ORDER BY "START_TIME";

This was successful. If we review the generated code, we can see that the cast now happens on the temporary table columns, pay attention to the table aliases:

...FROM ("SAPABAP1"."BCA_PAYMITEM" "REMOTEDB" INNER JOIN (SELECT "C1", "C2", "C3" FROM #JRT_0X00007F4A33519010_1 ) "REMOTEDB2" ON ("REMOTEDB"."CONTRACT_INT" = HEXTOBIN("REMOTEDB2"."C1") AND "REMOTEDB"."ITEM_ID" = HEXTOBIN("REMOTEDB2"."C2")) )

Example 7: Remote Join Relocation workaround with correct explicit casting

And that's a wrap, I hope this has helped you to find a workaround in the scenarios where remote join relocation is blocked due to data types.

Remote Join Relocation and VARBINARY Blues

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z