How smart is Smart Data Access on HANA SP7?
Background
We feel SDA has great potential in a number of use cases, however we wanted to see how it would perform when combining local and remote data sources, which is key for any practical use. For this particular blog, I’m using both HANA for Local(BW) & Remote(Sidecar) sandboxes that are both on revision 72, both sitting on the same physical rack but on different servers. I may follow on with a Local HANA & a remote Oracle db if I folks are interested and we can get the drivers to work for connecting to an 11g db.
Setting up SDA
Not going to go over the actual setup in this blog, as it’s been well documented on saphana and in other blogs; some I’ve noted below..
Smart Data Access: Connecting multiple SAP HANA… | SAP HANA
Smart Data Access with HADOOP HIVE & IMPALA
To Confirm:
(Local & Remote) are both HANA Rev72
select database_name as remote_db, substr(version,0,7) as remote_db_version from sys.m_database
Demo Objective:
Find the Equipment Status from the “CV_EQUIP_GEN_NEW_STATUS” calculation view on HANA Sidecar HT2 (contains SLT replicated tables from ECC) for a small range of Equipment Numbers from the Equipment Master table “/BI0/QEQUIPMENT” in BS1 (BW on HANA).
Step 1: Add Virtual Table
Navigate to _SYS_BIC (where all activated content resides)
Scroll to find the desired view, right click and Add as Virtual Table.
Create Virtual Table in a local Schema we created for SDA Testing.
Let’s execute some queries..
Query 1
Query on BS1 to remote calc view through virtual table with some filters added.
select * from "SDA_TEST"."Zhana_CV_EQUIP_GEN_NEW_STATUS" r_cv
where r_cv.equnr like '6XR39%' -- equip range
and r_cv.datbi = '99991231' -- active records only
What’s being Passed to the Remote db? You can check using..
- Highlighting the Select statement and doing a Explain Plan or Visualize Plan, I’d recommend to a Explain Plan minimum just in case you don’t get the remote query you expect, e.g. In this case we want to avoid materializing the entire calc view on the remote db and bringing back the entire result set.
- For this demo and what I believe is the most accurate way is to query SQL Cache on the remote database to find the actual sql statement being executed.
There’s a lot more fields of interest in m_sql_plan_cache, just selecting a couple of fields for illustration.
select STATEMENT_STRING,TOTAL_CURSOR_DURATION,TOTAL_EXECUTION_TIME,TOTAL_EXECUTION_FETCH_TIME,
TOTAL_PREPARATION_TIME,TOTAL_RESULT_RECORD_COUNT
from m_sql_plan_cache
where upper(statement_string) like '%CV_EQUIP_GEN_NEW_STATUS%'
and user_name = 'SMART_BS1' -- User name from the SDA connection details
order by last_execution_timestamp desc
I’ll execute each statement 3 times so we can average these by dividing by 3.
(Note the equipment range record count is 10,224/3 = 3408)
This is confirmed by BS1 results window
Statement ‘select * from “SDA_TEST”.”Zhana_CV_EQUIP_GEN_NEW_STATUS” r_cv where r_cv.equnr like ‘6XR39%’ — …’
successfully executed in 481 ms 263 µs (server processing time: 352 ms 300 µs)
Fetched 3408 row(s) in 2.048 seconds (server processing time: 9 ms 802 µs)
So we can see that the filtering is being successfully passed to the remote query as expected. 🙂
Query 2
Query same range, but this time apply filters to local BW equipment master table and join local to remote table in query.
select le.equipment, re.stat
from sapbs1."/BI0/QEQUIPMENT" le, "SDA_TEST"."Zhana_CV_EQUIP_GEN_NEW_STATUS" re
where le.equipment like '6XR39%'
and le.equipment = re.equnr -- inner join on equipment
and le.dateto = '99991231' -- active records only
and re.datbi = '99991231' -- active records only
SQL Plan cache results
Statement String,
SDA was smart enough to apply the equipment range filter to the remote calc view, even though I had applied that filter to the local db table. 🙂
Query 3
(2 local tables & remote join to Calc View)
Create a local column table and insert the same equipment numbers range as previously used.
create column table sda_test.temp_equip_list (EQUIPMENT NVARCHAR(18) );
insert into sda_test.temp_equip_list
select equipment from sapbs1."/BI0/QEQUIPMENT" where equipment like '6XR39%';
Now include the new column table in the same query.
select lt.equipment, re.stat
from sda_test.temp_equip_list lt, sapbs1."/BI0/QEQUIPMENT" le, "SDA_TEST"."Zhana_CV_EQUIP_GEN_NEW_STATUS" re
where lt.equipment = le.equipment
and le.equipment = re.equnr
and le.dateto = '99991231'
and re.datbi = '99991231'
I was a bit weary of this one, so let’s check the explain plan ahead of executing.
That’s not good, the remote query has lost the crucial equipment filter and is now only filtering out the in-active records. I would have expected some remote cache or remote join type in this scenario. Let’s let it roll to confirm..
Query hangs and eventually after a couple of cancel attempts, gives a rollback error on the local db.
Checking the remote to check the damage..
Ugghh, 🙁 not good at all. it bombed after 56million rows, the remote calculation view can return up to 80 million active records. The sql statement string reflects the same as the explain plan above.
Query 4
Let’s simplify this a bit, let’s just see if having one local table and one remote table helps the remote cache and remote query join kick in. So created a new virtual table called Zhana_EQBS (4 million rows x 12 columns) and join to local sda_test.temp_equip_list (50K rows x 1 column)
select lt.equipment, re.b_werk plant, re.b_lager sloc
from sda_test.temp_equip_list lt, "SDA_TEST"."Zhana_EQBS" re
where lt.equipment = re.equnr
While the response time is quite impressive, all 4 million rows are still fetched from the remote query
(Note, again executed same query 3 times)
Observations in Summary:
- So based on the results above, we have to be quite selective on how we use SDA as it comes out of the box in rev 72. It has potential and there may be improvements on the way.
- I’ve been able to create a calculation view based on a virtual table and expose it to BW through an external view, but I haven’t been able to create a Attribute or Analytical view based on virtual tables as all virtual tables get IS_COLUMN attribute set to false.
- There’s a bug in Studio (again rev 72) where you can’t filter on the remote schema/table. It only brings up the local tables, which is quite painful if you have a lot of activated content in _SYS_BIC for example.
- There are parameters available under configuration that may be able to help, so I’m going to open a message and engage SAP to see if we can improve the efficiency on how smart data access is suppose to work. I’m not sure if statistics have a part to play in the query optimization, but again hoping SAP folks can help out here.
Sean - excellent information. Thanks for sharing!
Thanks Ryan, glad you found it the information useful.
Nice first blog. The SDA team are charged with improving this so I'm sure they will be responsive.
However you can work around this right, by avoiding to use Calc Views in the remote system and consuming the tables correctly? Although for BW that's not ideal.
Thanks John,
Query 4 is just that, one small local table and one remote table, apologies if this wasn't very clearly illustrated.
What I had hoped for and I have played around with the smart data access config parameters post the analysis above, was some intelligence on when and where it was going to be most efficient to join the local & remote data sets.
For Query 4, I would have hoped the local 50K rows would have been transferred to the remote db, stored in some type of temporary table and then joined with the remote much larger table EQBS table. Of course this could be done through manual coding, but the label does say "Smart Data Access"...
Thanks for a great blog Sean, interesting to see that you have selected the BWoH and HANA native as one of the first scenario to test SDA. As you mentioned in the comment above within SDA we have techniques like 'join relocation' and 'semi-join reduction' which are supposed to do exactly this, if the local dataset is much smaller compared to the remote dataset, then move the local data to remote DB perform the join there and then bring back the result dataset to HANA. this is supported on most Remote Sources with some exceptions. we will review the behavior internally to see if these are known issues and then work towards fixing them.
Thanks Balaji,
We're working on setting up a Oracle remote source also, however the documentation suggests on 12c being supported, so we need to see if we can work around this for an 11g db.
The 'join relocation' for example is a hidden parameter, so it's a little confusing if this remote join/cache is really enabled out of the box for rev 72 or is there some additional setup required? Any guidance you can provide would be most appreciated, should I go ahead and open up a message in the meantime for this?
Thanks,
Sean
remote join is enabled by default on most sources, please go ahead and log a message so that we can have a Developer look into the issue.
Oracle11g should work since most of the DB connectivity is downward compatible.
hi Balaji, we did open a new incident for help with connecting to an Oracle 11g db. After a couple of weeks back and forth, the last update from SAP was to indicate only Oracle 12c was supported. Have you seen other customers set-up remote data sources for Oracle dbs with a lower db rev to Oracle 12c?
Oracle 12c is the only supported version of Oracle. Please refer to
the following note
1868209 - SAP HANA Smart Data Access: Central Note
09.07.2014 - 01:23:08 GMTUK - Info for SAP by
We did try to us the .odbc.ini and specify DSN, however this did not
work
for 11.2.0.3 DB. Can you confirm if its possible, and if so can you
provide
instructions?
Yes Oracle 11g is not supported, which is ironic because Oracle 12c is not supported for SAP applications.
I don't see why you couldn't use the generic ODBC connector for Oracle 11g though. Did you try that? We found configuring SDA for Oracle was fiddly, because of Oracle. Are you sure this is an Oracle 11g problem and not just a SDA misconfiguration?