Hana Smart Data Integration - Architecture

werner_daehn · ‎01-18-2016

This post is part of an entire series

The SDI product consists of three main building blocks: The Hana Index Server, the Data Provisioning Server and the Data Provisioning Agent.

While the first two are processes of the Hana instance itself, the agent is an external process and as such can be installed anywhere.

SDA use case

As the architecture involves quite a few components, best is to start with an example and follow its way through all the steps.

The user did execute a SQL statement like "select columnA, substring(columnB, 1, 1) from virtual_table where key = 1".

This SQL statement is part of a user session in Hana and enters the SQL Parser. The first thing the parser needs is the required metadata - does a virtual table of that name even exist, what are its columns etc. All of this are data dictionary tables in Hana.
The SQL Optimizer does try to pushdown as much of the logic as possible. In order to make adapter development simpler, it cannot simply pushdown the full SQL but rather look at the metadata what kind of statements the adapter told to support. In this example the adapter shall be a very simple one, all it supports are select statements reading all columns and no functions; simple equal where clauses it does support. Hence the optimizer will rewrite the statement into something like "select columnA, substring(columnA, 1, 1) from (select * from virtual_table where key = 1)". This statement will return the same as the original one but now it becomes obvious what parts are done inside the adapter and what has to be done in Hana. The inner "select * where.." is sent to the adapter and the adapter will return the row with key = 1 but all columns. Hana will then take that row, read columnA only and return its value plus the result of the substring function to the user.
The SQL Executor is responsible for getting the actual data, so it will tell the Federation Framework to retrieve the data of the optimized SQL.
The Federation Framework is responsible of combining the data coming from Hana and the remote system. In this simple example the SQL select reads remote data only, so it will call the appropriate methods in the adapter, that is an open call to establish the connection to the source (if not done already for that session, provide the inner SQL to the adapter by calling the executeStatement method and then call the fetch method until there is no more data.
Since the Federation Framework cannot talk via the network to that remote adapter directly, is has a Delegator for the Federation Framework. This components calls the equivalent methods in the Adapter Framework of the Data Provisioning Server.
The Adapter Framework itself sends the command via the network to the agent where the corresponding methods in the Adapter are called. The responsibility of that component is to route the commands to the correct adapter and deal with any error situations like agent cannot be reached etc.
The adapter acts as a bridge. It gets an open call with all the parameters provided by the remote source object, it should open a connection to the source system. It does receive the SQL command in the executeStatement method, hence it does translate that into the equivalent call for the given source. Its fetch method is called, the adapter should return the next batch of data by reading the source data and translating the values in to the Hana datatype value.

Realtime push

The definition of what remote tables the user wants to subscribe to follows the same path as SDA from step 1 to 5. A SQL statement is executed - create remote subscription on ... - and all the validation like, does the table exist, does the adapter support the required capabilities etc are performed.

The interesting part starts when the subscription is made active. With the "alter remote subscription ..queue" command the Realtime Provisioning Client is told to start the subscription. There the first checks are performed, e.g. prevent starting an already started subscription.
The Realtime Provisioning Manager inside the DP Server decides what needs to be done in order to get the change data. Basically this means two things, either telling the adapter to start sending the changes or, if the adapter is sending the changes already for another subscription and its data can be reused, simply consuming it as well.
If the adapter has to be notified about the request of getting change data, the Adapter Framework forwards that request to the agent and from there to the adapter.
And the adapter does whatever needs to be done in order to capture the requested changes. This is really source specific, for databases it might mean reading the database transaction log, for other sources it could be implemented as a listener and the source does push changes. and in worst case the adapter has to frequently check for changes in the source. From then on the Adapter keeps sending all change information for the requested data back to the Adapter Framework and from there into the Change Data Receiver.
The Change Data Receiver has to deal with various situations. If the subscription is in queue state still, then the initial load of the table is in progress and hence no change information should be loaded into the target yet. Hence it has to remember these rows somewhere, the Change Data Backlog Store. In case the source adapter does not support re-reading already sent data, then all received data is put into that Backlog Store as well to allow the source sending more data, even if the data has not been committed in Hana already. In other cases the receiver provides the data to the Applier for processing.
The Change Data Applier is loading all data into Hana in the proper order and using the same transactional scope as the data was changed in the source. It is the Applier who deal with the case that one change record is used for multiple subscriptions and loads the data into all targets then. In case the target is a table, it interprets the opcode (insert, update, delete, ...) received and performs the proper action on the target table. And in case of a Task, it makes the incoming change set unique per primary key before sending the data to the Task object (insert+update+update = insert row with the values of the last update statement).
the Applier currently creates regular SQL statements like insert...select from ITAB; or start task using parameter...; and these statements are executed by the Index Server like any other SQL statements.

Data Dictionary Metadata

One important aspect of the architecture is that all, really all, information is stored in Hana itself and nothing in the adapter. The reason is simple, there might be multiple agents for the same source system for failover, the adapter/agent might stop working and upon restart needs to know where to pickup the work, the adapter can be reinstalled.

All this data is stored in Hana tables and can be quite interesting to debug a problem. Usually these tables are not queried directly but rather a public synonym pointing to a view, which has the row level security implemented inside, is used. Here is a list of such objects

AGENTS: Returns the list of all known Data Provisioning Agents and how to reach them.
ADAPTERS: Returns the list of adapters known by the Hana database. New entries are added whenever an agent does deploy an adapter previously not known. When one of these adapters has the flag IS_SYSTEM_ADAPTER = true, then it is an Adapter based on ODBC and executed by the Index Server. All other adapters are the SDI adapters.
ADAPTER_LOCATIONS: As one adapter can be hosted on one agent but not the other or on multiple agents, this table tells the relationship.
REMOTE_SOURCES: For each created remote source one line is returned.
VIRTUAL_TABLES: All created virtual tables can be found in there.
VIRTUAL_TABLE_PROPERTIES: Additional metadata the adapter requires at runtime are stored in Hana and can be seen via this view.
VIRTUAL_COLUMNS: The columns of each virtual table.
VIRTUAL_COLUMN_PROPERTIES: Additional metadata can be added to columns as well.
REMOTE_SUBSCRIPTIONS: The list of all remote subscriptions and their state.
REMOTE_SUBSCRIPTION_EXCEPTIONS: In case a subscription has an error, the exception number and the reason can be found here and using the exception id a recovery can be triggered manually.

The SQL statements used can be found here: Hana Adapter SDK - Interaction via SQL

Monitoring

Using above metadata tables all monitoring can be done using pure SQL commands. But there is a Hana Cockpit based set of screens as well in a separate distribution unit found on Service Marketplace. Very handy to install these instead plus it shows the status of the Task framework as well, the calculation engine based data transformation framework part of the SDI solution.

Agent Configuration Tool

In order to help setting up the agent and the adapters, the Agent Configuration Tool part of any Agent installation can be used. It does execute the SQL commands for adding agents, adapters, etc and edits the local configuration files.

SAPDB Adapters

Above the only adapters explained where the SDI adapters. If the ODBC based adapters of the Index Server are used, then no Data Provisioning Server, Agent etc is required. Instead the Federation Framework does access these ODBC adapters via the SAPDB Adapters component. Although more and more adapters will be moved to the SDI, this path will continue to exist but for SAP owned databases only. Sybase ASE and IQ for example. All those databases where going though multiple hops and an SDK does introduce limitations of some kind.

C++ Adapters

Speaking of multiple hops and SDK, one question might be why the Agent is required even if the source is local. For this case there is a C++ version of the SDK available as well and such an adapter can be accessed by the Data Provisioning Server directly. The only adapter implemented currently is the OData adapter. And as this SDK is the regular Adapter SDK, such an adapter could be deployed on the Agent as well. Then a local source would be accessed directly via the DP server, a remote source via the agent.

The normally used Adapter SDK is the Java version however and as Hana is written in C++ it cannot run the Java code directly, there has to be some inter-process communication which the Agent handles. Hence for a scenario, where the source is local, installing the Agent on the Hana box could be an option but the is needed still.