Hana Smart Data Integration – Architecture
This post is part of an entire series
Hana Smart Data Integration – Overview
- Hana Smart Data Integration – Adapters
- Hana Smart Data Integration – Batch Dataflows
- Hana Smart Data Integration – Realtime Table Replication
- Hana Smart Data Integration – Realtime Sources with Transformations
- Hana Smart Data Integration – Realtime Sources with History Preserving
- Hana Smart Data Integration – Architecture
- Hana Smart Data Integration – Fun with Transformation Services
The SDI product consists of three main building blocks: The Hana Index Server, the Data Provisioning Server and the Data Provisioning Agent.
While the first two are processes of the Hana instance itself, the agent is an external process and as such can be installed anywhere.
SDA use case
As the architecture involves quite a few components, best is to start with an example and follow its way through all the steps.
The user did execute a SQL statement like “select columnA, substring(columnB, 1, 1) from virtual_table where key = 1“.
- This SQL statement is part of a user session in Hana and enters the SQL Parser. The first thing the parser needs is the required metadata – does a virtual table of that name even exist, what are its columns etc. All of this are data dictionary tables in Hana.
- The SQL Optimizer does try to pushdown as much of the logic as possible. In order to make adapter development simpler, it cannot simply pushdown the full SQL but rather look at the metadata what kind of statements the adapter told to support. In this example the adapter shall be a very simple one, all it supports are select statements reading all columns and no functions; simple equal where clauses it does support. Hence the optimizer will rewrite the statement into something like “select columnA, substring(columnA, 1, 1) from (select * from virtual_table where key = 1)“. This statement will return the same as the original one but now it becomes obvious what parts are done inside the adapter and what has to be done in Hana. The inner “select * where..” is sent to the adapter and the adapter will return the row with key = 1 but all columns. Hana will then take that row, read columnA only and return its value plus the result of the substring function to the user.
- The SQL Executor is responsible for getting the actual data, so it will tell the Federation Framework to retrieve the data of the optimized SQL.
- The Federation Framework is responsible of combining the data coming from Hana and the remote system. In this simple example the SQL select reads remote data only, so it will call the appropriate methods in the adapter, that is an open call to establish the connection to the source (if not done already for that session, provide the inner SQL to the adapter by calling the executeStatement method and then call the fetch method until there is no more data.
- Since the Federation Framework cannot talk via the network to that remote adapter directly, is has a Delegator for the Federation Framework. This components calls the equivalent methods in the Adapter Framework of the Data Provisioning Server.
- The Adapter Framework itself sends the command via the network to the agent where the corresponding methods in the Adapter are called. The responsibility of that component is to route the commands to the correct adapter and deal with any error situations like agent cannot be reached etc.
- The adapter acts as a bridge. It gets an open call with all the parameters provided by the remote source object, it should open a connection to the source system. It does receive the SQL command in the executeStatement method, hence it does translate that into the equivalent call for the given source. Its fetch method is called, the adapter should return the next batch of data by reading the source data and translating the values in to the Hana datatype value.
The definition of what remote tables the user wants to subscribe to follows the same path as SDA from step 1 to 5. A SQL statement is executed – create remote subscription on … – and all the validation like, does the table exist, does the adapter support the required capabilities etc are performed.
- The interesting part starts when the subscription is made active. With the “alter remote subscription ..queue” command the Realtime Provisioning Client is told to start the subscription. There the first checks are performed, e.g. prevent starting an already started subscription.
- The Realtime Provisioning Manager inside the DP Server decides what needs to be done in order to get the change data. Basically this means two things, either telling the adapter to start sending the changes or, if the adapter is sending the changes already for another subscription and its data can be reused, simply consuming it as well.
- If the adapter has to be notified about the request of getting change data, the Adapter Framework forwards that request to the agent and from there to the adapter.
- And the adapter does whatever needs to be done in order to capture the requested changes. This is really source specific, for databases it might mean reading the database transaction log, for other sources it could be implemented as a listener and the source does push changes. and in worst case the adapter has to frequently check for changes in the source. From then on the Adapter keeps sending all change information for the requested data back to the Adapter Framework and from there into the Change Data Receiver.
- The Change Data Receiver has to deal with various situations. If the subscription is in queue state still, then the initial load of the table is in progress and hence no change information should be loaded into the target yet. Hence it has to remember these rows somewhere, the Change Data Backlog Store. In case the source adapter does not support re-reading already sent data, then all received data is put into that Backlog Store as well to allow the source sending more data, even if the data has not been committed in Hana already. In other cases the receiver provides the data to the Applier for processing.
- The Change Data Applier is loading all data into Hana in the proper order and using the same transactional scope as the data was changed in the source. It is the Applier who deal with the case that one change record is used for multiple subscriptions and loads the data into all targets then. In case the target is a table, it interprets the opcode (insert, update, delete, …) received and performs the proper action on the target table. And in case of a Task, it makes the incoming change set unique per primary key before sending the data to the Task object (insert+update+update = insert row with the values of the last update statement).
- the Applier currently creates regular SQL statements like insert…select from ITAB; or start task using parameter…; and these statements are executed by the Index Server like any other SQL statements.
Data Dictionary Metadata
One important aspect of the architecture is that all, really all, information is stored in Hana itself and nothing in the adapter. The reason is simple, there might be multiple agents for the same source system for failover, the adapter/agent might stop working and upon restart needs to know where to pickup the work, the adapter can be reinstalled.
All this data is stored in Hana tables and can be quite interesting to debug a problem. Usually these tables are not queried directly but rather a public synonym pointing to a view, which has the row level security implemented inside, is used. Here is a list of such objects
- AGENTS: Returns the list of all known Data Provisioning Agents and how to reach them.
- ADAPTERS: Returns the list of adapters known by the Hana database. New entries are added whenever an agent does deploy an adapter previously not known. When one of these adapters has the flag IS_SYSTEM_ADAPTER = true, then it is an Adapter based on ODBC and executed by the Index Server. All other adapters are the SDI adapters.
- ADAPTER_LOCATIONS: As one adapter can be hosted on one agent but not the other or on multiple agents, this table tells the relationship.
- REMOTE_SOURCES: For each created remote source one line is returned.
- VIRTUAL_TABLES: All created virtual tables can be found in there.
- VIRTUAL_TABLE_PROPERTIES: Additional metadata the adapter requires at runtime are stored in Hana and can be seen via this view.
- VIRTUAL_COLUMNS: The columns of each virtual table.
- VIRTUAL_COLUMN_PROPERTIES: Additional metadata can be added to columns as well.
- REMOTE_SUBSCRIPTIONS: The list of all remote subscriptions and their state.
- REMOTE_SUBSCRIPTION_EXCEPTIONS: In case a subscription has an error, the exception number and the reason can be found here and using the exception id a recovery can be triggered manually.
The SQL statements used can be found here: Hana Adapter SDK – Interaction via SQL
Using above metadata tables all monitoring can be done using pure SQL commands. But there is a Hana Cockpit based set of screens as well in a separate distribution unit found on Service Marketplace. Very handy to install these instead plus it shows the status of the Task framework as well, the calculation engine based data transformation framework part of the SDI solution.
Agent Configuration Tool
In order to help setting up the agent and the adapters, the Agent Configuration Tool part of any Agent installation can be used. It does execute the SQL commands for adding agents, adapters, etc and edits the local configuration files.
Above the only adapters explained where the SDI adapters. If the ODBC based adapters of the Index Server are used, then no Data Provisioning Server, Agent etc is required. Instead the Federation Framework does access these ODBC adapters via the SAPDB Adapters component. Although more and more adapters will be moved to the SDI, this path will continue to exist but for SAP owned databases only. Sybase ASE and IQ for example. All those databases where going though multiple hops and an SDK does introduce limitations of some kind.
Speaking of multiple hops and SDK, one question might be why the Agent is required even if the source is local. For this case there is a C++ version of the SDK available as well and such an adapter can be accessed by the Data Provisioning Server directly. The only adapter implemented currently is the OData adapter. And as this SDK is the regular Adapter SDK, such an adapter could be deployed on the Agent as well. Then a local source would be accessed directly via the DP server, a remote source via the agent.
The normally used Adapter SDK is the Java version however and as Hana is written in C++ it cannot run the Java code directly, there has to be some inter-process communication which the Agent handles. Hence for a scenario, where the source is local, installing the Agent on the Hana box could be an option but the is needed still.
Thank you! I have two questions:
1. Does that mean in case of a HANA to HANA SDI real-time replication, no Data Provisioning Servier is needed on the target, and no Data Provioning Agent is needed in as a middleware?
2. In case of a HANA to HANA SDI real time replication, do the source tables need a primary key?
1.In my point of view, the HANA to HANA SDI real time replication ,the remote source system doesn't need to enable the DPServer, the target system which needs the remote tables replicate to it needs to enable the DPServer. And it needs to install DPagent. Because the DPServer resident adapter doesn't include the HANA adapter.The HANA adapter is in the DPAgent.
2.In a HANA to HANA SDI real time replication , the source tables could support no primary key.
Thanks for sharing this information .
I have one question - our REMOTE_SUBSCRIPTION_EXCEPTIONS_ has grown and has around 400 K entries , how we can purge these records ? Since its in SYS schema we can not truncate using any other db user.
If the remote subscription exception records is always growing ,it must encounter some big issue.It has two work-around to solve the issue.
1.From the error,check the issue description and solve the issue until there are not exception increasing.Then we could write a store procedure to re-try / ignore all the exceptions automatically.
2.If the exceptions are too large to truncate.You also could try to drop the recreate the environment.
for a customer we evaluate the usage of SDI. I have a question about 2 features that we want to implement in SDI / HANA:
With SDI we want to read data from multiple heterogeneous data sources.
1. During the read we want to check if the data format is correct. If the data format is not correct we want to send a email with a protocol to a responsible person. Is it possible to implement such format checks and sending an email protocol in SDI? Is there any SDI feature that eases the implementation of these features?
2. After reading the data we want to store them in a staging area in Hana (some customer specific tables) in order to perform business validations on these data. Is there a Hana technique that we can use to implement the business validations? Is there a features that allows us to trigger the validations in an ABAP Application Server, or alternatively in Hana XSA?
thanks a lot for your blog.
I would like to know if a SD Agent configuration in High Availability is possible and how.
The source system is an ECC with 3 AS and 2 of them serve in cluster the ASCS.
What an awesome research and explanation about the SDI and SDA!
However, I still cannot understand one thing - in order to replicate data in real time - you need to to the DB, and run this massive SQL query, and you basically just replicate all the database, instead of only 1 table. Am I right? Or there is a mistake?
And also, we want to do the SDI central server and I havent found any advice on the system parameters for it - CPU, Memory and such. Can you please advice?
Thanks in advance
What massive SQL statement do you mean precisely?
The LogReader Adapters all work the same. First you enable the adapter to read the database changes. Now the adapter can see all that changed in the database without adding overhead to the database as it does write the transaction logs anyhow.
And then the adapter will read the changes about the requested tables. So in your case the one table only.
Keep in mind that the dpagent and its adapter just acts as a bridge. It does not do any heavy lifting but streaming the data from the source database to Hana. Hence very lightweight.
Thank you so much for the nice blog providing a very good understanding on the SDI architecture. I have one question. Assuming that we have multiple HANA instances, can the DP Agent connect to multiple HANA systems? Or do we need to have multiple DP Agents to connect to multiple HANA instances?
Hope you are doing good !
I have been using SDI extensively, but was curious to know if there is a security layer which can be implemented. At this point, the sql privileges of Create Virtual table / Delete, Add remote subscription etc are provisioned on the remote source and that repo role is granted to the developers. But is there an automated way of handling this, rather than having to assign the roles manually to the developers?
Looking forward for your response. Thanks,