Technical Articles
SAP HANA Cloud: switch between data federation, replication and snapshot
Recently Douglas Hoover posted an exhaustive blog about HANA Data Strategy: Data Ingestion including Real-Time Change Data Capture
It summarizes all Enterprise Information Management(EIM) options offered by SAP to get data in and out of SAP HANA.
I want to take some time to go in details into new features that have been introduced with SAP HANA Cloud : online switch between virtual table, snapshot and table replication.
Remote data access can be time consuming because data needs to be transferred through the network each time a query is executed. In certain situations, replicating the remote data to the local system might offer better SQL query performance than accessing the data in a remote table.
For this reason,SAP HANA, Platform edition already had capabilities to replicate data via Smart Data Integration (SDI) and virtually access data (access without replication) via Smart Data Access(SDA).
SAP HANA Cloud can make use of the existing SDI features exactly as the on-premise version by taking advantage of the Data Provisioning Agent / Data Provisioning Server architecture.
Supported sources and their versions via SDI for SAP HANA Cloud :
- ABAP tables &BPI
- ODP Extractor
- SAP ASE
- ECC on ASE, IBM DB2, Microsoft SQL Server database, Oracle database
- SAP BW
- Microsoft Office Access
- IBM Informix Dynamic Server
- Netezza
- Vertica
- MySQL
- Cassandra
- IBM DB2 Database enterprise edition
- IBM DB2 z/OS
- IBM DB2 iSeries
- Microsoft Excel
- Text files delimited and fixed width
- HDFS
- SAP HANA
- Hive
- Impala
- Microsoft SQL Server database
- OData
- Oracle
- Outlook PST files
- PostgreSQL
- SFTP files
- SOAP
- Teradata
Some drivers were removed from SDA for HANA Cloud because they are supported in SDI.
Supported sources and their versions via SDA for SAP HANA Cloud :
- SAP HANA 1.0SPS10 or later
- SAP IQ 16.0 or later
- Amazon Athena
- Google BigQuery
Find out more about feature differences between SAP HANA Platform on-premise and SAP HANA Cloud. (SAP Note 2868742)
And now, for the first time with SAP HANA Cloud, users can toggle between virtual tables and replication tables.
To use a replicated table rather than a virtual table, the virtual table must have a corresponding replica table added to it using the ALTER VIRTUAL TABLE statement.
When a query is executed on the virtual table, it automatically accesses the replica table existing in the local SAP HANA system.
There is no need to set up an ETL job via a flowgraph or replication task.
The toggle feature is available for SAP HANA Cloud remote sources (SDA) and all SDI-supported remote source types.
Real-time replication is only supported for remote sources that support real-time change data capture (CDC). This property can be seen in the ADAPTER_CAPABILITIES system view in the IS_CDC_SUPPORTED column. Remote sources which do not support CDC only work with a data snapshot.
This blog is based on the following sources :
- SAP Help – Toggling Between Virtual Tables and Replication Tables
- SAP Help – ALTER VIRTUAL TABLE Statement (Data Definition)
- Checking the behavior of SAP HANA Cloud virtual table replica feature by Haruki-san
Let’s explore differences between data federation/caching/replication.
1. Data Federation : virtual table
First, you need to connect SAP HANA Cloud to your data source, either via the SDA driver or via the Data Provisioning Agent. Refer to this blog to set up your remote source.
Once the remote source is set up, create a virtual table on SAP HANA Cloud.
In our case, we connected SAP HANA Cloud (AWS Frankfurt) to an on-premise instance of SAP HANA 2.0 SPS04 (AWS us-east) via the remote source “OPHANA”.
create virtual table "FVT_LINEITEM" at "OPHANA"."<NULL>"."TPCH"."LINEITEM_2";
A Virtual Table is created.
Check the number of rows with the following statement
select count(*) from "DBADMIN"."FVT_LINEITEM";
if you execute a SELECT statement on the virtual table, it is executed on the source system (On-Premise SAP HANA) and only the result set is transferred to SAP HANA Cloud.
select l_suppkey, sum(l_quantity) from "DBADMIN"."FVT_LINEITEM" group by l_suppkey;
2. Data Replication : replica table
If you want to switch from virtual access to data replication, execute this ALTER TABLE statement on the virtual table.
alter virtual table "FVT_LINEITEM" add shared replica;
When you run this command, a table which stores replicated data is automatically generated in the schema “_SYS_TABLE_REPLICA_DATA”.
From the runtime information of the generated table, you can see that the number of record is identical to the source table and it physically stores data on SAP HANA Cloud.
Check the number of records of the virtual table, it is same number as before the ALTER TABLE statement
Now when you execute the same SELECT statement as before, you get the performance of a table stored in-memory locally.
select l_suppkey.sum(l_quantity) from "DBADMIN"."FVT_LINEITEM" group by l_suppkey;
When data in the source table gets updated. (here adding two records), it is automatically replicated to SAP HANA Cloud.
Check the number of records of the virtual table, you can find 2 records are added.
select count(*) from "DBADMIN"."FVT_LINEITEM";
Also check the auto-generated table and you can see the record count is increased by 2 records. Data added in the source table has been replicated to SAP HANA Cloud.
When you do not need replication anymore, cancel the replication with this ALTER TABLE statement.
alter virtual table "FVT_LINEITEM" drop replica;
The auto-generated replica table is dropped.
3. Data Snapshot : snapshot replica
If you want to perform an initial load of the data currently in your source, without continually updating the replicated table, you can create a snapshot of the data by executing this ALTER VIRTUAL TABLE statement on the virtual table.
alter virtual table "FVT_LINEITEM" add shared snapshot replica;
A table is automatically generated on schema “_SYS_TABLE_REPLICA_DATA” as same as data replication.
Check the runtime information of the table and you can find the table physically stores data.
SELECT statements return results as fast as with the previous replication case.
However if you add/update records in the source table, nothing gets replicated to SAP HANA Cloud. In this case I have added one record.
The source table was updated, however the data in the SAP HANA Cloud snapshot table is not updated.
Check the number of rows of the virtual table, the data is not updated.
You can refresh the snapshot anytime with the following command.
alter virtual table "FVT_LINEITEM" refresh snapshot replica;
The virtual table data gets updated.
The Snapshot table is also updated.
Finally, delete the snapshot with the following command.
alter virtual table "FVT_LINEITEM" drop replica;
The snapshot table is dropped.
Conclusion : Virtual access and replication
You can gain insights into virtual tables by checking the system view “SYS”.”VIRTUAL_TABLES” in SAP HANA Cloud : whether or not there is a replica, or which actions you can perform on the virtual table.
The virtual table replication feature introduced with SAP HANA Cloud can perform real-time data replication and snapshot data creation simply by switching the virtual table settings with an ALTER TABLE statement.
It is not necessary to create ETL jobs anymore : users can optimize their query performance much easier than before !
In this 2 minutes video, we showcase the use of replica tables when reporting on S/4HANA finance data (ACDOCA table).
1) Create an SDI connection from HANA Cloud to SDI via ABAP adapter (called S4_CAL in the video)
2) Create a virtual table on ACDOCA (called S4_CAL_ACDOCA in the video)
3) select top 1000 * from the virtual table : 4527 milliseconds
4) replicate the data to HANA Cloud and then select top 1000 * from the replica : 11 milliseconds
This shows the ease of use of replicas, and the huge performance improvement when data is replicated to HANA Cloud.
Many thanks to Takao Haruki for his support with this blog.
Maxime SIMON
Hi,
Could you please help me understanding if we have already replication task in SDI then what will be significance use case of virtual table shared replica. Because both serves same purpose of storing physical data into table.