Hana Smart Data Integration – SQL Server LogReader Adapter
This SDI adapter is used to read and load SQL Server data and supports batch and realtime. It is based on the Sybase RepAgent for realtime capture of changes.
The remote source object in Hana is pretty much what one would expect. It asks for the SQL Server hostname and port, login credentials and a database name. The same parameters entered when connecting to a SQL Server database using any other tool.
Note: We talk about the MSsqlLogReaderAdapter here, not the ODBC driver!
But let’s go though the settings slowly.
- Rest should be self explanatory.The Instance Name is any arbitrary text. The logreader adapter will create a subdirectory with all its settings for this remote source. And since a remote source points to one database of a SQL Server Instance, there might be multiples.
- Administraton Port is the port to be used by the RepAgent. Again, any will do as long as this port (plus the next higher number) is available. A common choice for the first remote source on this SQL Server instance is 13456, meaning it will be using the port 13456 and 13457.
- Data Server used by the adapter. Typically the logreader adapter is installed on the SQL Server computer itself. Therefore, as seen from the adapter, the SQL Server instance is installed on the local host. Remember: The communication between Hana and the adapter hapens via the DPAgent. When configuring it, the remote server name was specified.
- Port Number of the SQL Server instance. The default port for SQL Server’s first instance is 1433. To find out for sure, see below the installation chapter.
- DAC Port Number: By default one higher than the standard port. Certain SQL Server operations are not allowed via a regular SQL connection, an Direct Administration Communication protocol is required for that. For pure SDA use cases it is actually not used.
- Database Name with the data. This remote source points to one database of the SQL server instance only, an existing database has to be specified.
- The Use Remote Database setting is again used for realtime only, it hints to the adapter if the adapter runs on the SQL Server computer or somewhere remote.
To find out the SQL Server Port being used, open the SQL Server Configuration Manager.
With these settings the remote source can be browsed and all virtual table be created.
The SDA part of the adapter is fairly simple. It gets a SQL statement in Hana syntax and translates it into SQL Server syntax, data is converted from source into Hana datatypes.
Hence the adapter supports all typical pushdowns, joins, where clauses, functions,…. really extensive.
It even supports insert/update/delete statements executed against the virtual table.
Installation of the Adapter
The adapter itself is part of the DPAgent installer, but its JDBC driver is missing as SAP cannot distribute software owned by somebody else. Therefore the MSSQLServer JDBC driver in version 4.0 has to be downloaded, extracted and then the JAR file is copied into the <DPAgent>/lib/ folder.
The download page is here: https://www.microsoft.com/en-us/download/details.aspx?id=11774
On the next page the 4.0 version should be selected for download and once available locally, the content can be extracted – the exe files are self packed archives either.
Locate the sqljdbc4.jar file and copy it into the lib folder of the location where the DPAgent was installed – by default that would be c:\usr\sap\dataprovagent.
Now the Adapter can be deployed via the AgentConfig Tool (see here how to install it).
SQL Server supports one method to identify changes in the database, the CDC API. This adds a change table for every source table, thus every change in the original table causes one full row to be written into the change table. And when reading all changes, every single change table has to be queried to find out what kind of changes there had been. For a handful of tables that is okay but scanning 1000’s of change tables every second just to find out that nothing was changed? This is a huge overhead.
SQLServer also supports reading the transaction log, which contain all the changes the database made anyhow. So this would be a much better starting point. The command “dbcc LOGINFO” provides some insight and there are more hidden options (described e.g. here). But this is more for recovery purpose. One major issue with that other than being hard to decode the data is the fact that SQL Server might truncate the transaction log at any time. In case the reading of the log is too slow that would mean data can be lost. Not acceptable of course. But there is another SQL server feature, the Replication option, which works on transaction logs as well and for that Microsoft had to solve the log transaction problem as well – they do not truncate the log until the replication process is done. This proofs the fact that transaction log reading is an option, just very difficult.
Sybase has solved that problem already, hence the Logreader Adapter can utilize the RepAgent code as second option.
So the adapter supports two replication technologies, “Native Mode” means reading the transaction log or “MSSQL CDC Mode” using the SQL Server CDC API.
For the MSSQL CDC Mode nothing has to be done. The adapter does issue the required SQL Server commands and starts reading the changes.
Setting up Transaction Log reading
For the transaction log reading, that is where SQL Server has no direct support and requires a few steps hence. Actually, the documentation describes these steps quite nicely, hence the steps are copied only but augmented with images and hints.
The first step is to create a user called DP_USER and make it a sysadmin.
create login DP_USER with password =‘<password>’
use <your database>
create user DP_USER for login DP_USER
EXEC sp_addsrvrolemember ‘DP_USER’, ‘sysadmin’
Aforementioned Direct Administration Connection (DAC) is used by the RepAgent to execute statements not possible via a regular SQL connection, hence needs to be enabled.
Log on to Microsoft SQL Server using the DP_USER login and change the Microsoft SQL Server Remote Admin Connections Configuration option to enable DAC to allow remote connections.
sp_configure ‘remote admin connections’, 1
Verify the remote admin connection string.
sp_configure ‘remote admin connections’
Make the SQL Server log files available to RepAgent
In Windows Explorer, navigate to the sybfilter driver installation directory. This directory is located at <DPAgent_root>\LogReader\sybfilter\system\winx64, where <DPAgent_root> is the root directory of the Data Provisioning Agent installation.
Right-click the sybfilter.inf and select “install” to install the sybfilter driver.
Under any directory <DPAgent_root>\LogReader\sybfilter\system\winx64 create a file named LogPath.cfg.
Add a system environment variable named RACFGFilePath and set its value to the full path of the configuration file.
In Windows Explorer, navigate to <DPAgent_root>\LogReader\sybfilter\bin\, and right-click the sybfiltermgr.exe file and then select “Run as administrator” to start the sybfilter driver management console.
To start the sybfilter driver, enter “start” in this screen.
Add the log file path to the sybfilter driver by typing “add serverName dbName logFilePath” in the sybfiltermgr program. For example, to add log file named pdb1_log.ldf for the database pdb1 on the data server PVGD50857069A\MSSQLSERVER, use this:
add PVGD50857069A\MSSQLSERVER pdb1 C:\Mssql2012\MSSQL11.MSSQLSERVER\MSSQL\DATA\pdb1_log.ldf
Restart Microsoft SQL Server to make the log file readable.
Execute the “check” command in the sybfiltermgr console screen.
Open the Windows Services screen and stop all services related to the database instance, the database, the agent, all…
Right-click your Microsoft SQL Server instance and choose Properties
Under Start parameters, enter -m
Click Start to restart the instance in single user mode
Connect to Microsoft SQL Server using dedicated administrator connection (DAC)
Start SQL Server Management Studio without logging in and select File -> New -> Database Engine Query.
In the Connect to Database Engine dialog box, in the Server name box, type ADMIN: followed by the name of the server instance. For example, to connect to a server instance named ACCT\PAYABLE, type ADMIN:ACCT\PAYABLE.
Complete the Authentication section, providing credentials for a member of the sysadmin group and then click Connect.
In case this connection is the first one the login will succeed and the connection has full control of the SQL Server instance including hidden features. To validate the server truly is in Single User Mode, type
In case this hidden database cannot be found, either the server is not in single user mode or the connection is not using DAC.
Copy the script contents from <DPAgent_root>\LogReader\scripts\mssql_server_init.sql into the SQL Server Studio Query screen and execute the script to initialize the server for replication.
Stop and restart Microsoft SQL Server service back to normal mode by removing the -m option from above.
Using the realtime subscriptions
With the adapter being ready, it is just a matter of setting up a realtime subscription for the virtual table and doing something with the data. For this there are multiple options, starting from creating a .hdbreptask to simply replicate a few tables from this source database into Hana or using the previously imported virtual table in a realtime .hdbflowgraph.
Or simply using SQL commands which the UIs do under the cover as well.
create column table t_test1 like v_test1;
create remote subscription s_test1 on v_test1 target table t_test1;
alter remote subscription s_test1 queue;
insert into t_test1 select * from v_test1;
alter remote subscription s_test1 distribute;
select * from v_test1;
select * from t_test1:
In above example a table in Hana is created with the exact same structure and primary key as the source.
Then a remote subscription is created loading all change data into the target table.
With the alter remote subscription .. queue command the adapter is told to start capturing changes in the source.
Then the current data is copied from the source into the target table – else a source update statement would not find a row to update in the target.
And finally with the distribute the changes are really starting to be applied to the target table and from now on the target table will be in sync with the SQL Server table.